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PREFACE 


This is an introductory book on Mathematics of stochastic 
variables. The book deals with elementary Probabilitv and 
Statistics. This can be used as a book for self study or fo/a one 
year slow course in Probability and Statistics. Since it is also 
intended as a book for self study, the various symbols and letters 
use , are explained then and there throughout the book. The pre¬ 
requisite is one year Calculus. But a person with high school 
Mathematics can follow the book except the few section^ which 
use Calculus extensively. In Chapter 1 an introduction to Set 
theory and Linear Algebra is given. The pre-requisite for this 
chapter is only high school Mathematics. The later sections of 
.Linear Algebra may be omitted because they are not very much 
used. These facts are mentioned in the different sections. An 
attempt is made to make the book semi-rigorous. 

? 18 a . fairl 7 balanced treatment of theory and applications 
and this will give a sufficiently good background for further 
studies The book is based on the topics covered in an intro¬ 
ductory course in Statistics for the General Arts and Science 
students at McGill University. 

The book is intended for : 


(1) self study. 

(2) a two semester course io Probability and Statistics. 

(3) a one semester course in Probability (Chapters 2-5). 

(4) a one year course in Indian universities. 

Special features : 

1. Anew approach —built up on stochastic variables and 
the operator called ‘Mathematical Expectation’ ; uni¬ 
formity in notations. 

2. Student’s difficulty of distinguishing discrete, continuous, 
finite, infinite, observed and hypothetical populations' 
is avoided by properly defining and developing the 
theory, based on the theory of sets. 

3. The d : alogue is designed to suit the particular age group 
of students who are likely to take the course. 

4. A number of worked examples are given in each section 
and a good number of examples are taken from problems 
of day to day life. 

5. The development of the theory is very slow in the be- 
ginning chapters and the discussion is precise and a 
minimum in later chapters. 

6. An insight into the advanced and various related topics, 
is given to the reader in every section. 

7. Summary of correspondence between topics, important 
results, formulae etc. is given in every chapter. 


8 . 


9. 

10 . 


11 . 


A good number of problems, among which some of ^ 

A Element the theory already discussed, are 


given 


at 


the end of each section 

A unified theory of statistical inference is developed i n 

the last chapter. , , . „ 

Consistency is kept throughout; back references, 
ifL in the later sections and repetitions are : 


refer. 


minimum. . 

Trivial results and results stated without proof are giv eil 
in the form of comments after illustrating them by 


examples. 

12. Every subsection is illustrated by atleast one worked 
example. 

A list of notations is given at the end of the book. J n 
numbering sections, equations and problems the following notations 
are used. For example, 

Problem 10‘24 (21th problem in chapter 10). 

Section 2.3.2 (second subsection of the 3rd section of 


chapter 2). 

Equation (8.21) (21st equation in chapter 8). 

Answers of even numbered question in the whole book and of 
almost all questions in chapters 1 — 10 are given at the end of the 
book. Answers of almost all even numbered questions are 
rechecked. 


Several people have spent their precious time in helping the 
author to complete the book. The author wishes to express his 
sincere thanks to the following professors, Dr. Mir M. Ali, Dr. J.C. 
Ahuja, Dr. V. Seshadri, Dr. D. Dawson, Dr. B.D. Aggarwala, for 
their valuable comments on some sections and Dr. M. Stephens, 
Dr. M. Csorgo, Dr. E. Saleh, Dr. G.P. Patil, Dr. J.K. Wani, Dr. 
R.K Saxena for the interesting discussions on some topics. The 
author extends his heartfelt thanks to prof. T.D. Dwivedi for 
helping the author to proofread the materials and for checking 
the answers of some exercises. The author would like to thank 
Dr. P.P Singh, Dr. N.K. Mathur and Shri S.K. Agrawal for their 
help in making a detailed index. Miss H. Schroeder for taking up 
the hard job of typing the first draft of the manuscript, Mr. M. 
Yalovsky and Mrs. F. Gordon for their comments, the National 
Research Council of Canada, Prof E.M. Rosenthall and the 
Department of Mathematics, McGill University for the financial 
assistance in computing a few tables. 

Sir R„naM aU A h °i?' i k ' nd n*2, the Literar y Executor of the Jate 
Ltd Edinb u i'|, r f!, he k P ' R - S ’ 9 ambrid S e - and to Oliber and Boyd 
their bonk 'ft h at^ h | el M Pe f“ 1SS10n t0 re P r Mt Table No. 5 from 
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CHAPTER 1 


INTRODUCTION 


v- J Introduction. In this chapter some of the basic ideas 
which are required for the developments in later chapters are 
discussed. A reader who is familiar with the definitions of set, 
population, sample, vector and matrix may omit this chapter, 
amly the definitions and elementary properties of sets, popula- 

tions, vectors and matrices are considered. Most of these results 
are utilized in later chapters. 


M SETS 

I’ll*. Definition. A collection or an aggregate of well- 
defined objects is called a set. These objects which belong to the 
set are called elements of the set. Sets are usually denoted by 
capital letters A, B, C etc., and their elements by small letters 
a, b, c etc. Set, aggregate, group, collection etc., are synonyms in 
ordinary language but in mathematical language they have differ¬ 
ent meanings. 'well-defined’ here means that any object may 
be classified as either belonging to the set A or not belonging to 
the set A. 


Notations. A means that a is an element of the set A 
where is a Greek letter 'epsilon’. 

A means that b is not an element of the set A. 

A={0, 1, —100} means that A is a set with elements'^, f l 
an d —100. 0£A, 1 £A, and —100 £ A but for example 20§£ A. 

B = {x | - 10<a:<25} means that B is a set of all points 
which lie between —10 and 25 (including end points). This defines 
an interval (closed), [—10, 25]. 

C=--{{x, y ) | 2^+3y=5) means that the set of all paired 
values {x, y) for which the equation 2z-|-3y=5 is satisfied. 

D = {(ffl, b) \ a, b£ A) means the set of all pairs of elements 
(a, b) where a and b are elements of a set A. 

Ex. 1 . 11 . 1 . The set of numbers between 1 and 3. 
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Comments. This set contains an infinite number 0 f 1 

ments since there are an infinite number of numbers between 1 

and 3. l 

Ex. 1.11.2. The set of definitions in a particular text booh 

Comments. This set contains a finite number of element 
It may be noticed that the elements of a set need not be number 
and that a set may contain a finite or an infinite number 0 f 
elements. 

Ex. 1.11.3. The set of assumptions for a particular mathernati 
cal statement. 

Comments. The elements of a set can be real or abstract 
quantities, animate or inanimate objects etc. 1 

Ex. 1.11.4. The set of boohs, pens and students in a cla «« 
room. uss 


Comments. A set may contain different types of obiects 
or objects having different characteristics. If Mr. Fox is a student 
m the class he is an element of the set. That is, A where a 
is Mr. Fox and A is the set of books, pens and students under 
consideration. The reader is advised to construct some examples. 


112. Sets and Populations. A set whose every element can 
be characterized by K characteristics may be called a K-variate 
populatmn, where K is a number greater than or equal to one (>1) 

S ! 18 ™ Univ ?, riate Population ; if K=2 it is a bivariate 
p p ion, etc. ty e will start with this definition for a popula- 
tion and as we proceed further we will discuss populations defined 

etc Tt I** if fc ai f P°P ulatl ons defined by a stochastic variable, 
alwavs 6 notlce( | that this notion of population is not 

conversation. ^ ^ n0tl0n ° f P°P ulation used in ordinary 


- ° fhe,WS ° J aU the '»***' “ 

containspopulation because the set 
chLXize/bf n f IDber 0f elements « Every element is 

a student TMf . cbaracteristlc > namely, ‘height measurement’ of 
population. IS an exam P^ e f° r a one variate or univariate 

weigM^anfhfgthfof righfarZTfdU^'1 .. meamrements , ?J 

at a particular time. S am °* the Cltlzens m a particular city 

three numbers*) el , e ‘ uent of tlls set is a collection of 

(height, weight and length of right arm) or each 
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element is specified by three characteristics. This is an example 
of a trivariate or three variate finite population. 

Ex. 1.12.3. The set of true effects of a particular drug on all 
the animals of a particular category. . 

Comments. This may be considere 1 as a univariate but 
hypothetical population. According to the definition it may be 
noticed that a population may be finite or infinite, real or hypo¬ 
thetical, discrete (individually distinct) or continuous. The reader 
may construct three examples each of (a) univariate population 
( b ) bivariate population (c) sets which cannot be considered as 
populations at all. 

1.13. Outcome Set. The set whose elemen f s are all possible 
outcomes of an experiment, is called an outcome set. This outcome 
set is also called sample space, possibility space, universal set, 
sure event etc. Thus an outcome set is a particular type of set, 
where the elements are the possible outcomes of an experiment. 
An experiment may be defined as a procedure which results in 
some outcomes in a particular situation. An outcome is a single 
realization of a phenomenon under consideration, under the 
assumptions and notations used for the procedure (experiment). 
The outcomes need not always be numbers or quantities which 
are representable in terms of numbers. A .philosophical discussion 
of experiment and outcomes is not attempted here. 

The outcomes of an experiment may be represented in diffe¬ 
rent ways in other words the possible geometrical or algebraic or 
other representation of the outcome set is not unique. That repre¬ 
sentation where the elements do not represent more than one 
distinct outcome in some sense, is usually taken as the outcome 

set. Thus an outcome set does not allow any subdivision of its 
elements. 


Ex. 1.13.1. Consider an experiment of throwing a coin twice 
Let one side (say, head) be denoted by 1 and the other side (say tail) 
be denoted by 0 If we rule out the possibility of the coin standing 
on its edge, then the possible outcomes are (0 0) (0 1) (10) (1 7) 

where the first dement in a bracket denotes the result 'on the hrst 

trial. Here the outcome set is the set of 4 ordered pair of number, 
given above. 1 J 


Comments. These may be represented as four geometrical 
pomts m a two dimensional space. Any single outcome here may 
be represented by a pair of numbers (a, b) where a and b take 
values 0 and 1 or where a and b are defined on the set (0 11 That 

8 t 6) b 5 * 0, where S denotes the outcome set 

set. * ‘° — a tu 0Ve assum P tions and terminology the outcome 

follower “ 00m 13 0nee • ‘ WiCe . U timeS ' are S iven as 
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Total number of 
outcomes in the 
outcome set 


No. of times the 
coin is thrown 

Possible 

outcomes 

The outcome set 

Once 

0,1 

2 

{a} where a is defi. 
ned on the set (0. n 
or {a | a£{0, l}} 

Twice 

(0,0), (°. l); 

(1, 0) and (1, !)• 

2 a = 4 

{(a, b) | a, 6£{0, 1)} 

Thrice 

(0, 0, 0), (0, 0, 1), 
(0, 1, 0), (0,1, 1), 
(1, o, 0 ), ( 1 , 0 , 1), 

(1, 1, 0), (1, 1, 1)a 

• 

2 3 «=8 

• 

{(a. b, c) | a, b, c£ 

<0-1» 

• 

• 

• 

• 

n-times 

• 

• 

(0, 

• 

2 n 

{(ai, a 2) »**,a n ) 1 


If we consider the geometrical representation of the outcomes 
when the coin is tossed n times, we get 2 n points in an w-dimen- 
sional space. («i is read as a one etc). The reader may evaluate 
the outcome sets when a die is thrown once, twice, etc. (a die is 
a cube with the six faces marked with the numbers 1, 2, 3,4, 5, 6). 

1.14. Subsets. A set B is said to be a subset of a set A if all 
the elements of B are also elements of A. This relationship is 
denoted as BcA (B is contained in A or A contains B). 

Ex. 1.14.1. B—the set of male students in a class 

A—the set of students in the same class. 


Comments. If there are no male students in the class, then 

denoted" hvT,n ' a “* j18 called a nul1 ■«* and is usually 

ATs anv s/t * K '“f J >alled Phi >- So evidently 4>C A where 
same as A Thnt l“ d ^ tS are male students then B is the 
A=B if AcB and Br- A y two set f A and ® may be defined as 

anyset isaSubsetofftslif , 1 ^ & S ° be noticed ^at AcA 01 


^•2. B {* | l^x<^3} or the set of numbers between 

and 3 (both 1 and 3 inclusive 

{ | -I ^#^5} or the set of numbers betwee 
an d & {both 1 and 5 inclusive 

both A and B are infiniteTets ® ut i* 1 may be noticed ths 

elements. sets or se *3 containing infinite number < 


>caS saw-iw 


If A is a population and if 
respect to A. According to 
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the definition B itself is a population if A and B are not 
considered together. If the formulation of the results or inference 
from B can be extended to A in some sense then B is called a 
representative sample from A. 

Ex. 1,15.1. The heights of students in a particular class is 
a sample of heights of students in the university to which the class 
belongs. 

Comments. This is an example of a univariate population 
and a sample from it. If in this sample 20% of the students have 
heights over 6' 6" then the statement that 20% of the students in 
the university have heights over 5' 6" can not be made unless the 
sample is representative of the population. In other words we 
are not justified in making such a statement based on the sample 
unless the sample is selected in such a way that such statements 
are justifiable in some sense. This aspect will be discussed in more 
detail in the chapter on sampling. 

Ex. 1.15.2. The outcome of rolling 2 dice three times is a 
sample from the hypothetical population of all possible outcomes of 
throwing 2 dice. 

Comments. This is an example of a sample from a 
multivariate population. 

Ex. 1.15.3. The set of birds captured from a particular place 

by an experimental scientist is a sample of birds at that place at 
that time. * 

Comments. Generalization of his findings based on these 
birds can not be done unless the captured birds form a representa¬ 
tive sample. 

1.16. Events. A subset A of an outcome set S is called an 
event. That is, A(^S. If a subset A contains only one element 
(one outcome) then the event is called an elementary event 
Elementary events are single elements of the outcome set S. 

Ex. 1.16.1. A geometrical representation of the outcome set of 
rolling a die twice'is given in Fig. Tl. J 



Fig. M, 
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n . evmt of getting a total 2 (sum of the face numbers is 2) i s 
given by to PoM encircled in Fig 11. The event of getting 
VntniJo or 3 or 4 is given by the set of 'points below the line AB. 
The event of rolling 11 or 12 (getting a total of 11 or 12) is given by 
the set of points inside the dotted closed curve. 


Comments. The event of rolling 13 has no point or is a null 
set An event which is a null set is called an impossible event. 

The event of getting a total of either 2 or 3 or 4 or.or 12 is 

given by the entire outcome set. This event is sure to happen in 
any trial. Hence ScS is called a sure event. 


Ex. 1.16.2. Consider a dance party consisting of n couples. If 
the ages of couples are represented as points in a plane then we get n 
points. Here the outcome set consists of n points in the first quadrant 
as shown on Fig. 12. 


y 



Fig. 1-2. 


Consider the event of getting a couple from among the couples 
where the boy is older than the girl. This event is given by points in 
the shaded area. The event of getting a couple where the girl’s age is 
less than 25 is given by points below the line y=2-5. 


Ex. 1.16.3. Let the outcome set be 
points inside a square S then the events 
A and B may be represented by the 
sets of points in the closed curves a 
and (3 as shown in Fig. 13. 

Such diagrammatic representa¬ 
tion of sets is called Venn diagrams. 

In many problems it will be very 
convenient to deal with events and 
probabilities of events if a dia¬ 
grammatic representation is available. 


symbolically represented by 



Fig. 1-3 


M. 

eontinuous. 


Exercises 


Construct 3 examples each of 
(c) real, ( d) hypothetical, (e) finite, 


a set which is a) 
(/) infinite. 


discrete, (b) 
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1 * 2 ; Find 5 sets each of which may be considered as 
(a) univariate populations, (6) bivariate populations. 


consisting of two 


1 * 3 . In an experiment of rolling a die twice, write 
Bet as a set of vectors. 


down the outcome 


1 * 4 . Define the outcome set in an experiment of (a) drawing a card 
from a deck of 52 cards, (6) taking two c*rds successively when the first one 

Replacement 6f0re ^ Se °° nd ° n6 iS drawn ' (°) taking two cards without 


1 - 5 . An urn contains 10 black balls and 4 white balls, 
outcome set for the experiment of taking one ball. 


Define the 


, /til ex P er hnent consists of drawing a card from a well-shuffled 
deck ot 52 cards and then tossing a coin if a red card is drawn. Describe the 
outcome set or sample space for this experiment. 


1-2. VECTORS 


1*21- Definition. If the elements of a set fare arranged 
according to some order of succession the set is called a vector. 
The number of elements in the vector is called the order or the 
size of the vector. If the elements are put in a column it is called 
a column vector. In general a vector may be defined as an ordered 
set. In the following discussion we are interested only in real 
quantities. We are going to assume that the elements are all real 
numbers. In general they need not be real numbers. 


• x E - 1 * 211 * C° nsider the integers between 1 and 4 (both inclu¬ 
sive). That is, 1, 2, 3, 4. Then the arrangement ( 1, 2, 3, 4) is a 
vector, ( 1, 3, 2, 4) is another vector, (4, 1, 2, 3) is a third vector etc. 

Comments. ^ rom tile definition it may be noticed that 
two vectors are equal if and only if the corresponding elements 
are all equal. If y 1= (l, 2, 3) and V 2 =(i, 2, 3, 4) then Equality is 
not defined since they are of different order. A vector of order 
one is called a scalar quantity. A reader who is familiar with 
analytical geometry may notice that vectors may be considered 
as geometrical points. All the vectors in Ex. 1.21.1 arc now vectors 
of order 4. 


Xl x * . ’ Xn den °t e the yields of wheat at n 

'places. Then the vector (x lf ., x n ) denotes a vector of wheat yields. 

Comments. The usual notation for a vector is a simple 
bracket notation. If the price of every unit of wheat (say per 
bushel) is Rs. k then the money value of the wheat yields at the 
different places may be represented by the vector (kxi, ., kx n ). 

In general we can define a scalar multiplication of a vector by the 
equation 

H x i> . ,x n )—{kx i,.. . kx n ). 

Ex. 1.21.3. Let f x {x), f%(x), .. f T (x) be functions of x then 

J = [ fi[x)> M%) .. M%)] is a vector of functions, of order r. 

Comments. As the elements of a set can be any well-defin¬ 
ed objects the elements of the vector can also be any well-defined 
objects. 










s 


os 
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1-22. Addition of Vectors. Let {a lt a 2 , . a „) denot 

the money value of wheat produced at n different places and W 
(bi, b 2 , ...&„) be the money value of barley produced at the same n 
places. Then the total money value of wheat and barley produced 

at the different places is evidently (fli-fhi, a 2 +&a> .a n -f6 \ 

So in general we will define the sum of two vectors X = (» lj n ' 
-*»)andY ={y u y 2 ,...y n ) by X+ Y={xi+y 1> x 2 +y z , .* n +y«)."' 

Ex. 1.22.1. (1, -1, 0)+(2, 3, 4) «=(3, 2, 4). 

Comments. It may be noticed that only vectors of the 
same order and type can be added together. If X=(x lt x 2 x \ 

and 0 = ( 0 , 0 , 0 .0) then X+O-X. So there exists a VectorO 

such that X + 0=X. 0 may be called an identity element with 
respect to the operation addition. A vector with all elements 
equal to zero is called a null vector and is usually denoted by 0 . 


Ex. 1.22.2. 


But 


~ -7 ~ 


7 “ 


“ 0 - 

5 

+ 

~5 

= 

0 

-6 


6 


0 







+[2, 6, -7] is not defined, 


for even^'vpntn r ***** , exam Pl e it may Le noticed that 

This tbere exlsts » vector Y such that X+Y = 0 . 

addition. ^ * ° a ed an ,1,Terse of X with respect to the operation 

the seme' y ector8 P ace Consider a collection of vectors of 

orrs’-foi^ 

member of this collection. In other words the eolTeln & S ° ? 

under the operations of addition and scalar multi,!!* !! * S C i?, Sed 
the collection of vectors is called a vector space P That”)', 

{omega denotes the collection and if V.fvOi v !!' T , * S ' ,f Q 

and then V. + V.eQ and i V c n ^ ?** element of 

quantity. + and 4; Vi£0 where k is any scalar 

,ectorfV^r\ »Le » ”* 

ci vector space . are nurrt 'bers. Evidently Q is 

space. In this example the veotnJc ai } element of a vector 

a 5 d ' mens mnal Euclidian space n , coris ' sts °f all points in 
der the collection of all vector of'nr ?“ °, ther hand if we consw 
4 then the collection is not a der 5 and a11 vectors of order 

*** 8 en ® ra l an element of this collect i S * >a ,° e ' 8um °f any two is not 
general defined here. 18 ejection because the sum is not in 
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1*24. Independence of Vectors. Consider the vectors 
Vi-(I, 2,3) and V 2 = (2, 4, 6) then V 2 =2V L or V a _2V 1 =(0, 0, 0) 
= 0. Here V 2 depends on V lt If for two vectors V\ and V 2 , 
VH-fcV 2 =0 for a non-zero scalar quantity k then Vj and V a are 
said to be linearly dependent. Let us consider the following 
vectors, 

T,=(l. 2, 3) 

T,=(l, 2, i) 

Let us examine the relation iir 1 +fcjT l! =0 where hi and k, 
are unknown scalar quantities. 

&iTi-t-/c 2 T 2 =&i (1, 2, 3) -{- 7b 2 (1, 2, 4) 

= (hi-\-kz, 2ki-\-2k z , 3^!+ 3^ 2 ) 

^iTi+^2T 2 = 0^A;i-|-fc2 =0, 2fc 1 -|-2& 2 =0, 

an d 3A?i -f- 4-^2'= 0 =0=7 j 2 

(Here the notation => means ‘implies’. For example 2a=2b 
=>a—6, Similarlymeans ‘does not imply’) A number of vectors 
Vi, V 2 , ......V r each of order n are said to be linearly independent 

iffciVi+i.V^. ■ +k r V r ^k l ^0=k 2 ^k 3 = . =k r where k lt 

^ 2 ’ *. are sca * ar quantities. The number of independent 

vectors m a vect or space is called the dimension or rank of the 
vector space. If from a set of linearly independent vectors all the 
vectors m a particular vactor space are available by the operations 
ol addition and scalar multiplication then that set of vectors is 
called a basis of the vector space generated and the vector space 
is said to he g e aer a ted °r spanned by the set of basis vectors. 
Lne dimension of this vector space is the number of vectors in the 
generating system or in a basis. 

Ex. 1.24.1. Check whether the vectors V 1 =(2, 2 3) V* — (0 1 5) 
are independent. \ > > 1 

Solution. Let k l V l -f/c 2 V 2 =0 where lc x and k 2 are scalar 
quantities. 

hVi+kzV 2 =^(1, 2, 3)+& 2 (0, 1,5) 

= (ki, 2kx~{- k 2) 3^i+57c 2 ). 

^Vrf^V 2 =(0, 0, 0)=0, then 

&i=0 ] 

^ k x —0 and & a = 0 


If 


2k 1 -\-k i —0 
3^i -|- 5k 2 —O 

The vectors Vi, V 2 are independent. 
order 1 ™*’ ^ Vert ° rs - Consid " following vectors of 


Ci=(l, 0, 0, 
e 2=(0, 1, 0, 


0 ), 

.0), 


* • • »t I 


6n = (0, 0, .1) 
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,...e n are said to be unit vectors of 


These vectors e lf e 2) ... 
order n. Any vector 

X.—(x i, x 2 , . x n ) 

may be written as a linear combination of these unit vectors i 
the form, ln 

%iZi-\-x 2 e 2 -\- . -\-x n e n 

= 0, 0,.0)+* 2 (0, 1, 0,.0) + ... 

••• + #n(0, 0,.1) 

= {xi> 0, 0, .0)-f(0, 0 : 2 , 0 , .0) + ... 

, . v + . ,x n ) 

(&i, x 2 , . ,x n )—X 

Further it is easiiy seen that e lf e 2 .e„ are independent 

and therefore they form a basis of an w-dimensional space. 

T26. Orthogonal Vectors. Two vectors X=(x lt x 2 . Xn ) 

and i-lyj.y*. .2/«) are said to be orthogonal if the sum 

*i2/i—£ 22 / 2 +.+ a;^y M = 0. 

This particular sum is called the inner product of the two 
vectors X and Y and is usually denoted by XY' (X, Y prime) or 
* (X A. prime). If in a system of vectors every pair of differ¬ 
ent vectors are orthogonal then the system is called an orthogonal 
system of vectors If Y is the same as X then the inner product 
AY reduces to XX =#i 2 +;r 2 2 -j-. +x n z . 

The „ positive square root of this XX' is usually called the 
oTx) 01 i ^ ector ^ an ^ is usuall y denoted by || X || (norm 


i.e 


11 x 11 =vxx' 

= («l 2 + ^2 2 +.+« n 2 )l/ 2 . 

1 m . a y I 36 noticed that all the vectors ei, e 2 , . e n have 

eng s unity. If in any vector X, || X || =1 then the vector is 

called a normal vector. If Y=—| then Y is a normalized 

vector of X, since |j Y || =]. 

Ex. 1,26.1. The unit vectors ex, e 2t . e n form an orthogonal 

system of vectors. 

Comments. If a basis of a vector space is orthogonal it is 
ca e an orthogonal basis.jlt may be noticed that ei, e 2 , ...,e n form 
an or onormal basis for an ^-dimensional vector space (orthonor¬ 
mal means orthogonal and normal). 

Exercises 

.»bsot a ‘„LLT <£, U vtctLVorcti: ~ 5 - °* 6 - 7 - 8 - - 10 >- Find <■> aU 

events o?’o^ttina of throwing a balanced coin 3 times, find the 

heads. Represent U tl °”ts ottfJoL tW ° tai ' S ' <C > le "‘ ^ 
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1.9. In an experiment of rolling a die once, find the event of getting 
an even number. 

1.10. Find the sum and innerproduct of the vectors V, = (l, 2,5V 

V,-(—5,6,0\. * ‘ ' 

1.11. Construct a vector V 2 orthogonal to the vector V 1 =(5 J 6, —7, 0), 
show that every scalar multiple of V 2 is also orthogonal to 

1.12. Find two vectors Y r and V 2 which are orthogonal and both are 
orthogonal to the vector V 3 = (l, 1, —-l). 

1.13. Check the independence of the vectors V, = (l, 0. — 1), V*. 
=(2, 1, —1) and Vg = (3, 1, —2). 

1.14. Show that the vectors V x =(l, 0, 1), V,-=(2, 0,1) and V 3 =(1 J 1, 1> 
can form a basis of a vector space of dimension 3. 

1.15. Find the co-ordinates of the vector (1, 1) relative to a basis- 
(1, 0) and (2, 1). 

1.3. MATRICES AND LINEAR EQUATIONS 

1.31. Linear Equations. Consider the following set of 
linear equations, 

yi = 5x 1 —2x 2 +x 3 ...(1) 

y 2 = 2 .^ 1 + lx 2 — 4x 3 •••(2) 

These equations are completely specified by the coefficients 
5, —2, 1 and their order in the first equation and the coefficients 
2, 7, — 4 and their order in the second equation. That is, the- 
equations are completely specified by the vectors (5, —2, 1) and 
(2, 7, —4) or if these two vectors are given then a transformation 
of the quantities x\, x 2 , x 3 into y 1 and y 2 can be written down. If wo 
are interested in the order in which y x and y 2 occur, say for exam¬ 
ple yi=5xi—2x 2 is written as the first equation and the other 
one as the second equation then the equations as well as their order 
are specified by the arrangement, of vectors in the form, 

r 5 , - 2 , i -j 

L 2. 7, -4 J 

where the first row denotes the coefficients in the first equation 
and the second row denotes the coefficients in the second equation. 
In general a system of m linear equations in n unknowns and 
their order are specified by the arrrangement of coefficients in the 
form, 


®11> ®ln 

0-211 ®22>-**» ^2n 


where the i th row denotes the coefficients in the i >h equation, 

for i = l, 2 ,..., m and where for example a X2 is read as a one, two 
etc. 
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ful tool 


, introduction to statistical mathematics 

mI of matrices has become a use. 

1-32. Matrices. Tb l^ of applied mathematical activities. 
, ] tool in almost all bra “? h ® J£e elementary properties of 
H re“ e will define “^^row vectors of order n into » rows 

matrices. An arrangement of a , s0 be oonside red to be 

»r.Lr.U» .f 

ai*-. m ™* *?»" “ *" 

brackets, [Mb** F° r 

2 3 5 


A = 


-1 

5 


0 

7 


4 

— 2 


is a matrix A which is an arrangement of 3 row vectors of order 3 
into 3 rows. 

(2) A = (a, v ). This notation means that the i now j 
column element is a„. In the above example a„=2, »i 2 =3, a 32 -7 
etc. 

(3) A(mxw). This notation means that there are ni rows 
and n columns in the matrix A. 


Comments. When the number of rows = the number of 
columns= 7 &, the matrix is called a square matrix of order n ; other- 
wise it is called a rectangular matrix. If all the elements of a 
matrix are zero it is called a null matrix, and is usually denoted 
by 0 . 


1.33. Samples from a Multivariate Population. From 
the definition of a multivariate population it is evident that every 
element in the set designating a variate population has Tc com¬ 
ponents, namely, the h characteristics which define a variate 
population. If a sample of size wora subset containing n elements 
(sometimes called n observations) is taken then we get a matrix of 
the following form 


Xu X,2...Xi n ] 



where for example the first column denotes the first element with 
Jc components etc. 

nil student? P ons ^f er ^e height and weight measurements of 

stwdents ( sa ’J n ) *» « olass. This may be written as 

*^12) •••••»! 


r 

L®«i. 


«22, 


'> %1 n 
> 


where the column vectors , 
observations on a bivariat 


—i 

£ 

>-* 

_I 

r *i* i 

L *ii J, 

i— 

to 

to 


e population. 



represent n 






inT boduction is 

Comments. If these n students are considered to be , 
sample of students in a university then the height and weight 
measurements of the students in this university for 6 m a b - 
population. Here every element in the population has tw * 
components and hence the population is bivariate 

1-34. Stochastic Matrices. These are 'special types of 
matrices whose elements satisfy some conditions. A singly stochasti 
matrix A—(a i f) may be defined as a matrix whose elements a-. 

satisfy the conditions. (1) a^Oforcdl i andj, (2) 2^ a y t or ” u 


i or 


2i=i a a ssl f or al1 3- 


Comments From conditions (1) and (2) it follows that 

0<a,v«5 1 for all % andj. Here 2 (sigma) is a notation for a sum. 
jj or example, 

■£ 5 {=1 6 i = 6 i + & 2 +.+ 6 & . 

3=1 b ij=^i=i ^”=1 b ii =Z? = i (£. =1 b ii )=z^ 1 (6,i-f6 /a 

b a + .+^=! &;«=(611+621+.+ 6„i) 

+ ( &12 +.+ 6 »*) +.+ ( 6 «1 +.+6nn) = 611 + & 21 +.+ b nn . 

^ 2 I= 1 ^ 4 ;=1 6/ ; =6n+612 + 613+ bu +6 a i+& 2 i+ j 23 ^.5 

= 2* i=1 b if 


Ex. 1.34.1. 


A = 


'0 

1 


i 

0 

* 


¥ 

£ 

i 


Comments. Here the sum of the elements in any row is- 
j. . m a stoc ^ ast i c matrix the rows and columns satisfy 
e conditions that the sum of the elements in any row or column 
equals unity, then the matrix is called doubly stochastic. 


Ex. 1.34.2. 


B= 


i * 

I * 0 
L 0 * f 


Comments. Here the sum of the elements in any row or 
column equals unity. Hence B is doubly stochastic. Stochastic 
matrices are also called Markov matrices. The elements of a 
Markov matrix satisfy the conditions for probabilities and hence 
these matrices are called stochastic matrices. 

1-35. ALGEBRA OF MATRICES 
The development of the ideas in the following sections till 
e end of this chapter is very fast. If a reader finds it difficult 
0 rea( * the following sections he may omit them and use them as 
an appendix whenever these ideas are required later. These 
eas are used only in a few places in the succeeding chapters. 
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, » Jj.tion of Matrices. An operation analog OUs 

1 . 35 . 1 . Add ‘”°” fcion of addition on real numbers will be 
to the mathematical oper or<Jer t0 develop a theory based 0ft 
defined in this se ,^hiects called ‘matrices’ we have to define 
these mathematical obj ^ them We wil) define the operations 

“d a Snv“«alarmultiplication’, ‘multiplication’ and ‘inversion’ 
on matrices. 

Definition. Let A=(%J and B = (&,,) then A-f-B is defined 

as A+B=(a,-,-+ &,y) 

it is defined only for matrices of the same category ; that is, 
Jgx." and B(»x y ») may. be added, but A(mX») and B( mX r) 
•can not be added up if njkr etc. 


Ex. 1.35.1. Let A — 


[2 3 , 


*-[-/ 3 


Then 


a+b=[1-i 2+oi =ro 21 

13+2 4 + 5J 15 9J 


Comments. If 0 is a null matrix (a matrix with all ele¬ 
ments zero) then A-f-0=A. For any matrix A there exists a null 
matrix with the same number of rows and columns as A such that 
A+0=A. The definition of addition of matrices may be 
extended as A + B+C=(a i/ +b i/ + c i/ ) etc. It may be noticed that 
A-f(B-f-C) = (A+B) f C etc. 

1*36. Scalar Multiplication 

Definition. kA={ka i} ) where A=(a,-y) and k is a scalar 
quantity. 


and k—5 , then 


Let A — 

-1 

0 

—1~ 



2 

3 

2 

w 


J 

-1 

0_ 


5A = 

~ 5 

0 

i 

n 


10 

15 

10 


_ 0 

-5 

0_ 


Comments. If 4—1 then 4A = -A and A+4A=0. 
the nner^t; 6761 ^ matrix A there exists an inverse with respect t< 

addStT-lS— A^ 10617 the iDVerSe ° f A With reS P 60t * 

denotetransS: ^ A S ?h°en Let A = (%> and A 

element of A'is the ^ element of A. “ A =(«/,-)-the 
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Ex, 1.37.1. (a) A 5 = 

1 2 3 ~ 

- A' = 

" 1 4~ 


.4 5 6 


2 5 

3 6 

II 

" 1 2 3 “ 

2 5 6 

3 6 1 

l-m 

A'= 

i 

~1 2 

2 5 

3 6 


5 

6 
1 

Comments ■ The rows of A are the columns of A' and 

_ m 1 I • J « f* > ^ such a case A in 

called a symmetric matrix. It may be noticed that (A')' = A alwavs 

for B»ny matrix A* A vector is a special case of a matrix, so if X 
is a row vector X' is a column vector and vice versa. ’ A scalar 
quantity is a matrix having only one row and one column. 

1.38. Multiplication of Matrices. The product AB of 
matrices A and B is defined only if the number of columns of A 
equals the number of rows of B, that is if A and B are of the 
form mxn and nx^r respectively. 

Definition. Let A=(a, v ) and B =(&,..) then 


AB= 


n 


^ a ik bIcj 
h =1 


i.e., the i th row j ih column element in AB is the innerproduct of 
th e i th row vector of A and the j th column vector of B. 


Ex. 1.38.1. Let A = r 1 2 3 

4 5 6 


then 


and B=r 0 10 
2 15 
-2 0 4 


AB—ri x 0+2X2+3x ( — 2), lxl+2x1+ 3x0, lx 0+2x5+3 x 4~ 1 
l_4x 0+5x2 + 6x[ — 2), 4x l+5x 1+6x0, 4x0+5 x5+6x4] 

— ~ —2 3 22 1 
_-2 9 49 J 

Comments. If A is of the form mxn (m rows and n 
columns) and B is of the form nxr then AB is of the form mxr. 
In Ex. 1.38.1. A(2x3). B(3x3)=AB(2x3). If A and B are any 
two matrices then AB need not be equal to BA. If AB=BA then A 
and B are said to be commutative. If A is a square matrix of 
order n then A 2 may be defined as A.A and in general A*=A...A 
(k times) where k is a positive integer. If A is of the form mxn 
and B is of the form n X 1 then AB defines the multiplication of 
a matrix by a vector. 
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Ex 


. 1 . 38 . 2 . Let A 


r 2 0 51 and X={xi, $ a , x 3 ) th eil 


2 0 5 “ 

1 4 K 


XA'—fa> x% ' 


r2 11=(2 xi-\-5x3 > Xi-\-4x 2 -\-xz and 


0 4 

5 1J 


AT=r2 « 


ari 
Xz 

X3 J 


= r 2#i -l - 5xz 
L x\-\-4x<l-\xz 


] 


„ ments A scalar quantity is a special case of a matrix, 
scalar'mul triplication may be considered to be a special case of 
matrix multiplication. For example, 

5A=5r 1 0] A=r 5 OTA 


’ A=[~5 

<n 

L ° 

—l 

bO 


where 


A=r 0512 




*12 

®2i <*2a _ 


Further inner product XT' or TX' of two rectors X= 
(a!l , * 3 .*.) and ¥-<*. .... y.) may be considered to be a special 

case of matrix multiplication. 


Ex. 1.38.3. The following system of linear equations , 

an £p l~h a 12 x 2~\~ ••• X n—Vl 

a 21 # 1“|"®22 # 2 " f * ••• " h ** 2 rt 


**ml ^'l“l _ ®rn2 3?2"b •*• 2/w 

way 6e as a smy/e fliafna equation AX'=Y', where A = (a, y ) 

Xs=(#i, nnd Y=(yi, y^-^yn)- 

Comments. If y is a null vector then the system of linear- 
equations is called a homogeneous system of linear equations ; 
otherwise it is called a non-homogeneous system. If all the row 
vectors of A, namely, (a,i„ a y2 , •••, a,«) for i=l, 2, 3, m are inde¬ 
pendent then the system of equations forms a system ofw indepen¬ 
dent equations. A system of n independent equations in n unknown^ 
has a unique solution, otherwise a solution may not exist and u 
a solution exists it need not be unique. AX'= Y'may be consi¬ 
dered to be a transformation ofX=(rri, x Zi ..., x n ) into Y=(yi, yz> 
y m ) where A is the matrix of transformation. AX'=Y' may also be 
considered to be a representation of x’s and y's where A may be 
called the matrix in the representation. The above representation 
is called a simple linear representation. 
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1 , 39 . Rank of Matrix. The number of independent rows 
columns in a matrix A is called the rank of the matrix A. It is 
°h dimension of the vector space generated by the row vectors 
t e ^ e column vectors of A. It can be proved that in any matrix 
jje number of independent rows equals the number of independent 

columns. 

f 1 2 4 '1 

Ex. 1.39.1. Let A=\ 2 4 Q \thenranhof A=p{A)—l, 

where p(A) (rho A) is only a notation. 

Comments. The second vector (2, 4, 6) is dependent on 
the first vector (1, 2, 3) since (2, 4. 6) = 2(1, 2, 3). 

So the rank is one. The second and the third column vectors 
are dependent on the first column vector. Hence the number of 
independent column vectors is also one. The dimension of the 
vector space generated by the row vectors of order 3=the dimen¬ 
sion of the vector space generated by the column vectors of order 
2=1. It can be shown that in any matrix, row rank (=the 
number of independent row vectors) = column rank ( = the 
number of independent column vectors) =the rank of the matrix. 
The maximum number of independent row or column vectors 
possible for a matrix A(mxn) is m if m^n or n if 

Exercises 

1*16. Write down the following system of linear equations in a single 
matrix equation and also,find the rank of the coefficient matrix. 

* 1 + + 4a? 4 =0 

2 #!+ x 2 +» 4 =2 
— — 37 4 — 5 • 

1*17. Give two examples each of 

(а) a singly stochastic matrix, 

(б) a doubly stochastic matrix. 

1.18. If A, B, C are three square matrices show that 

(а) A+(B+C)-(A+B) + C, 

(б) A (B+C) = AB+AC, 

(c) A (BC)=(AB) C. 

1*19, In an experiment of throwing a balanced coin thrice, write 
down the event of getting atleast one head in matrix notation. 

1-20. If A and B are two square matrices of order n show that 

(а) p (A-j-B)^p (A)+p (B), 

(б) p (AB)^min. [p (A),p (B)] where p denotes the rank. 

1-4. SINGULAR AND NON-SINGULAR MATRICES 

The idea of singularity of matrices is closely associated to 
. e linear independence of vectors. A brief introduction to the 
Angularity of matrices is given here. 
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i di Singular matrices. A square matrix of order ^ . 

.. J? t 1 ' gi n tnilar matrix if the rank of A is less than n. j f > s 

Zk of A equals • then it is said to be non-singular. 

matrix of a linear transformation is singular the transform^* 6 
is a singular transformation. If the coefficient matrix of a svst '°" 
of linear equations is singular then the system is said to be ^ 
singular system of equations ; otherwise it is called a non-sing u } a ^ 
system. 


Ex. 1.41.1. 


Let 


A = 


~ 1 0 0 
0 10 
2 3 0 


then A is singular. 


Comments. Here the third row=twice the first row-M 
times the second row, and the first two rows are evidently inderi 
dent (unit vectors). Hence the rank of A = p(A) = 2 

1-42. Identity Matrix. If in a square matrix of order * 

the i“ row is the .“ unit vector for i=l. 2. n then the square 

matrix is called an identity matrix of order n and is 
denoted by I or I n . S usua % 


Ex. 1.42.1. 


matrix of order 3. 


Let 



1 0 0 ~ 
0 10 
0 0 1 


then 


I is 


an 


identity 


Comments. An identity matrix is said 

rt t h 

A = (a-) then a u a a 6 eadln & diagonal elements. If 

mentsof A lf mamatnx I 'Lu'thT the ,. leadin g diagonal el e- 
zero then the matrix is called a dinar, ” on ‘ dla g° na d elements are 
are D, diagonals (d* ^ 

the diagonal elements (some ofthp^/’a lj , 2> .denotes 

d’s are zero then the matrix” evident]! *7?, be Z ? r0 >- If ail 
the diagonal elements are unities thence ! “ at . nx and if all 

matrix. If all the elements above or belowlh'i V" 7“^ 

are zeros then such a matrix is called 7 , 1 fading diagonal 

may be noticed that IA=AI=A wfc * n . an gular matrix. It 
order n. Also = D 3 where D r> Sn A 18 a , s< 3 uare matrix of 
and AiA 2 =£ 3 where Aj A. and a m n 3 77 da £ ona l matrices, 
matrices of the same type.' If A and R ^ f hree) are tlda ngular 

flat'I A ‘ ten A and B are said t0 commute w ° ma tnces such that 
th J 1 " commute s with any sauare mat ' , “ a y be noticed 

the identity matrix of order l V cun b ° f ° rder n where I» is 
square matrix A there exi^’c + t Can P rove d that for anv 
such that PAQ=D where D is a dhLnn n ; SlngU - ar mat ™es P and Q 
is called the canonical form of A g ^ matnx - In such a case 
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Exercises 

defln e d 1 thLshottha\ B p(AS)!rp(OBT4"o“ lar ^ ‘ f A °' ° B ars 

-* fchat^A=0 t B S?£S 2^ " su “ h tta ‘ AB - «- 

1-23. Show that P (AA') = P (A'A)= P (A). 


=r: i] 


1*24. If A=l | and if AB=I find out B. 

3 4 


1*5. GEOMETRY OF VECTORS AND MATRICES 

1.50. Introduction. So far we have been considering some 
elementary properties of vectors and matrices. In this section we 
will consider the geometrical representation of vectors. The 
reader may be familiar with the geometrical representation of 
vectors of order one, that is, scalar quantities. In this section, for 
convenience, we are going to assume that the elements are all 
real numbers. Vectors of order one, for example, 5, 7, -2 etc. may 
be represented as points in a one demensional space (or a line). .. 

—2 0 5 * 7 ~ 

Fig. 1-4. 

If we consider vectors or order 2, for example (_1 2) (3 5 ) 

etc, these may be represented as points in a two dimensional 
space* 



Fig. 1.5. 

Here the first element in the vector (the first co-ordinate) 
is taken along one axis (say the Xj-axis) and the second co-ordi¬ 
nate is taken along the other axis (say the X 2 -axis). For example 
in the vector (3, 5), 3 is the Xi co-ordinate and 5 is the X 2 co¬ 
ordinate. Any vector (x 1} x 2 ) of order 2 may be considered to be 
a point in a two dimensional space or a plane. In general a vector 
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/ ) of order n may be considered to be a point in an w . di 

/In the geometrical language ‘dimension’ref* 
of the vectors whereas in algebraic langut'! 
dimension’ of a vector space means the number of linearly*^ 
pendent vectors which generate the vector space under considers, 
ibion. The reader may notice the meaning of dimension in the two 

usages). 

1.51. Length of the Vector. The distance between the 
point represented by a vector and the origin may be taken as the 
length of a vector. For example if X=(&i, x 2 ) is a vector of order 
2, that is, a point in a two dimensional space as shown in Fig. 15 
then the square of the distance between the point A and the origin 
(i.e., OA 2 )=;ri 2 +a;2 2 . 

|| X || 2 =a;i 2 +a; 2 2 =XX , =square of the length of the vector 
X. In general if X=(*i, is a vector of order n then its 

length may be defined as, 

II X || *=£i a +a?2 2 +... -f-a; n 2 =XX , . 

Evidently when X=(0, 0,...,0), i.e., a null vector then II X II =0 • 
otherwise || X 0 > 0 . " ’ 

1.52. Innerprodact. The innerproduct of two vectors X 
=(zi, x 2 ) and Y=(y lf y a ) was defined as XY' =x 1 y 1 -\-x 2 y i . In Fig. 
1.5, 

cos 0 ^cos {(f> — i/r)=cos <f> coe i/f+sin <f> sin 


Vi 


x 2 y 2 


VW+* 2 2 ) A/(yi 2 +yfrV(xi* + xS) V ( yi 2+ y f) 
XY' 


II X || . || Y || 

where 0(theta), ^(phi), 0(psi) are all Greek letters. In general if 

X=(a?i,..., x n ), and Y={y 1) ...,y n ) and if 6 is the angle between 
them, then 0 


cos 6 = 


XY' 


II X ||. || Y || • 


9<1 a S d 1 . theref ,° re XY '< II X II . || Y H (This inequa- 

When thp t S °*u artz 9 , lnequalit y) : when XY'=0, i.e., 

wnen the vectors are orthogonal 0=90°. 

easily 1 notic^d dd that“f X=1 ^ and M V lt ! PUCa f i0n ' It 1 “ ay U 
Fig. 1-6 then Xxv.u , '*** x *' an ? Y= (2 /i> 2 / 2 ) are as shown in 

-h - (*! \-y lt x 2 -\-y^ is a vertex of the parallelo- 
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gram as shown in Fig. 1-6, and hence || X+Y || is the length of 
the diagonal OC. From the property of a triangle, 

II X+Y || < || X || + || Y || . 



Fig. 1.6. 

It is easily seen from Fig. 1*7 that kX=k(xt, x s ) is a 
point A x on the line OA where A is the point X= (x\, x 2 ) and k 
is a scalar quantity. 

II kX || = || ft || . |1 X || . 



If X and Y are two points as in Fig. 1.7 then any point in the 
2 -dimensional space may be obtained as a linear combination of 
X and Y, i.e., two independent vectors (vectors which are not 
points on the same line) of order two generate the entire two 
dimensional space in the sense that any point in the space may be 
obtained as a linear combination of X and Y or any point Z may 
be written as 


Z=aX +6Y 



22 
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where a and b are scalar quantities. All these results m&, b 
generalized to an n-dimensional space, t.e., n independent Tect ots 
of order n generate the entire ^-dimensional space. 

1-54. Some useful results. It is seen that the length || X || 
of a vector X satisfies the following conditions: 

(1) || X || >0 and || X || = 0oX=0. 

(«?- means, ‘implies both ways’ or || X || =0 if and only if ^ 
is a null vector). 


(2) || &X || = I & I • II X || where k is a scalar quantity. 

(3) || X-t-Y || < II X || + II Y || where Y is another vector of 
the same order. Any function of the elements of X satisfying these 

conditions is called a ‘norm’ of the vector X=(a?i, x 2> ., x n ) and 

is usually denoted by || X || . Evidently the length of X is also a 
norm. Other examples for ‘norm’ of X are 


(а) maxi \ x { | = || X || say 

where maxi | x t \ means the largest of the magnitudes of x lt x z> 

• • • OCfi * 

For example if X = (—1, 2, —5) then max { \ x { | =5. 

(б) | | = || X || i say. 

l 

(c) {S { | *,■ | *} p = || X || p say (£>> 1). 

i.e., when p = 2, || X || \ gives the square of the length 

of X, namely, S { | x,- 2 | .-b^n 2 (Here all z’s are 

assumed to be real). 

In a similar way ‘norm’ of a square matrix A, say || A || , 
may be defined as a function of the elements a,-/s of A, satisfying 
the conditions. 

1. || A || >0 and || A || =0 <*A=0. 

2 . || kA || = | k | . || A || where & is a scalar quantity. 

3. || A-j-B || < || A || || B || where B is another square 

matrix of the same order as A. 


4. || AB || < || A || . || B || . 

The following are some examples for || A || 

n 

(a) maxi S | a {j | = || A || T say. 

J = l 

n 

(P) maxj S | a,-, | = || A || n say. 
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M 'V % 1 a, i I 2= II A II III say. 

(S) n max, s \ a„ | = || A || Iy say. 

1*55. Linear Transformations 


y 2 


Fig. 1-8. 

Let A==(a:i J x 2 ) be a point in a two dimensional space with 
reference to the rectangular axes of co-ordinates OXi and OX 2 as 
shown in Fig. 1-8. Let OYi and OY 2 be two lines passing through 
0 making angles 6 with OX x and <j> with OX 2 as shown in Fig. 1"8. 
The same point (x X) x 2 ) may be written as 

( yi, y 2 ) with reference to the axes of co-ordinates OY x and 
OYj. It may be easily seen that 

Yi = ana;i+ai2«2 
and Y 2 = a 2 iXx-i~ a 22 xz 

where a xx , a X2 , a 2X , a 22 are functions of 9 and $ 

i.e., AX'=Y where A^P 11 aia l,X=(a; 1) a: 2 ) and Y=(y x ,y 2 ) 

L a a i a 22j 

in general represents only a change in the co-ordinate system. It 
may be further noticed that when 9 = <f> or when the angle between 
OYi and 0Y 2 is v:/2— 90° then the matrix A satisfies the condition. 

aa '= i =[J JJ. 

In this case the transformation is called an orthogonal 
transformation. An orthogonal transformation is nothing but a 
rotation of the axes of coordinates. In general a linear trans¬ 
formation AX' =Y' where A is a non-singular matrix of order wand 
X and Y are vectors of order n represents a change in 
the co-ordinate system. If the transformation is orthogonal or 
if AA' = 1 then the transformation is a rotation of the co-ordinate 
system. 
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Exercises 

n 

1-25. If the following norms | A | =-max i 2 | I „ . „ 

I l I »Jl A D jj 

-mew, 2i M(A)=n » »ax tj \a tj j and N(A)-=( 2ij I <Hj J 2 ) 1/a are given ah(Mv 
that ( a ) (1/») M(A) < fl A fl a < M(A) ; ( 6 ) (I/») M(A) < || A n u ^ M(A * 

(c) (1/n) 1 /* N(A) < || A H ! < (w) 1 ' 2 N(A) ; (d) (l/n) 1 ^ N(A) < fl A fl ^ (n)1/! 
N(A). 

1.26. An elementary matrix is defined as a matrix obtained h 
(1) multiplying any row (column) by a scalar quantity, (2) interchanging a i 
two rows (columns), (3) adding a linear combination of rows (columns) to y 
row (column) of an identity matrix. Multiplication of a matrix by an el^ 
mentary matrix is called an elementary transformation. Show that 
•quare matrix A may be reduced to a diagonal matrix D by elementary tran/ 


1.6. DETERMINANTS 

1.60. Introduction. A square matrix is seen to be an 
arrangement of n 2 quantities into n rows and n columns. In this 
section we will define a function of the elements of a square 

ST 11 */ u 1x \. previous section we have seen a class of functions 
denned by their general properties or by a set of axioms or assump. 
tions. Sere also the determinant of a square matrix A will be 

defined as a function of the row or column vectors of A satisfying 
a set ol axioms. J 

oolumn'w^t Some A xiomS - Let “!•««...«» be the row (or 

(denoted w ! f * S<3 A Uare - matrix A then the determinant of A 
by det A or i A | is defined as a function of a. a, . 

<x„ satisfying the following conditions. 

where'o isTs^quanti^;.*" )==C - det ( «** 

det (*:. . , «,■ +ay,^.a„) = det («,, a 2 ...a,-...ay...a n ) 

3 ' d6t (l1 ’ “ 2 .&+*• a ») =det (a,, „ p, «„) 

+ det (a 1} a. 2 , .y," .a M ) 

4. det (e lf .e M ) = l. 

plied by a m s £la“ d a C uart,>!! at row .( or column) of A is multi- 
determinant bv the^ame y ? A 13 equivalent to multiplying the 

value of a ltim^nTis notaitlr^^' A "° m < 2 ) “ys that ** 
to another row (column) The lf any row (column) is added 

row (column) is‘ suHt n; w hlrd axiom implies that if any 

«,=fc: +Y ~ then the dew ° .* W ° Vectors -’ & and y, such that 
of two determinants one with** 1 * Can be considered to be the sum 

*tth repUced b'y y Cond’itio 6 ^ by a “ d the ° ther 

oy Y,. Condition (4) is equivalent to saying 
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that if the various rows (columns) are the various unit vectors 
in the natural order then the value of the determinant is unity. 
Some of the following theorems will enable us to evaluate the 
determinant of a square matrix A. 

Theorem 1.6.1. If one row (column) of a square matrix A 
is null then | A | =0. This follows immediately from axiom (1) 
by taking c=0. 

Theorem 1,6.2. The determinant of a singular matrix is 

zero. 


Proof. A singular matrix, according to the definition has 
dependent rows. So at least one of the rows may be written as 
linear combinations of others. Applying axioms (2) and (1) this 
row can be reduced to a null vector. Hence the result. 


Theorem 1.6.3. The value of a determinant is not changed 
if a constant multiple of one row (column) is added to another 
row (column). This can be easily shown by applying axioms (2) 


Theorem 1.6.4. If two rows (columns) of a determinant 
are interchanged then the magnitude of the determinant is not 
changed but the sign is changed. 


Proof. | A | 

D((Xi, Kg, . 





axiom 2 

• ••&£>••• — OCy — 0Cy,.«.Gt n ) 

axiom 1 

• • • OCy , • • • 0C £ — OCy 

J 


axiom 2 

=D(ax, •••ay,...—ay—ay,.. • a n ) 

axiom 1 

^ ( OCf *••• OCy OCy t n • • a n ) 

axiom 2 


axiom 1 


1.62. Gofactor of a determinant. If the i tn row of I A I 

is replaced by the j th unit vector e y then the resulting determi¬ 
nant is called the cofactor of a i} - and is usually written as | A i} \ 


Ex. 1.62.1. 


0 



1 0 . 0 

$22* ... *^2n 


= I A 12 | =Gofactor of 
«ia in | A | . 


®n2.$nn 


Theorem 1.6.5. A determinant | A | may be expanded in 
terms of the cofactors of the elements of any row (column). 
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Let A =( a i/) v 

n , \ /first row vector) 

L<!t •■+«!»«” where «i, «.-.«.«« <*eu nit 

vectors. 

By applying »»“ ( 3 ) successively 

I A | =D(«i, ..o,)=D(«uei, «*,-«..)+D(«i 2 e 2 , «*■-«.) 

«j-...-j-D(^iti6 n , ocj...ec n J, 

= Oll D(«i, <X 2 ,---< x n)-h a 12 D( g 2 > 0 t 2 ,--' 0 ln)~h” m ~i~&l* 

AJ[€fU 

Theorem 1.6.6. 

a t i I A k i I +a,2 I A k2 | + I A kn I =0 if k^i 

where A=(a ij ) ; i.e., the sum of products of the elements of 
one row (column) and the cofactors of the elements of any other 
row (column) is zero. 

Proof. a a I A kl | +a i2 | A k2 | + J A kn | 

== ®i i f^(^i> oc2•«• oCjj•.. Cl, ...ofj|)-f-fflj'2 D(ai,...a /J ...e 2 r 

®ii+l» f • • ) d“ • • • “1“« h(|)(i, 0^2,..• OC^-,...e n , 

=D(aia2....ofi,...<z jl 6j, a^i,...a n )-f-D(aj,...oc ; -, 

<Xk+i---«n) + ...+D(ai, a 2 ,...a a, n e n , a fc+ i,...a M ) 

= D(ai....a ;J ...a l -, a fc+ i,...a„)=0 since the i th and k th 

rows are the same. 

Theorem 1.6.7. | .A | =2± aii a y ...a M 

J >•••fc 

(all different) 

Proof. Let cLi=anei-\-ai 2 e 2 -\-... ~f~e n . 

I A | =Ou | An | +a 12 | A 12 | +...+o Jn | A Jn | ....(1) 

But = *a e i +o <2 e 2 +.. • -f a, „e„ ; i = l, 2 ...n. 

I A | =a n D(e 1 , a 2 ,...K„) + ...^-a ln D(e„, a 2 .. 

- «Ta ‘ - j “ 

i, jf.lc 

where the summation for alH j. h=\ 9 „ , 

from 1, to n. 1 or each suffix varies 

«*. e, D ^n;'omi S ord d et terminant Where the rows (columns) are 

Therefore D(e,,e,„.. Cs) = ±lor0 

Hence | A , = £ ±Bli . 0 

». h ...A: n * ’ 


*7^7^ • • • 3^ k (all different) 
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i.e., | A | contains a number of terms where every term con¬ 
tains one and only one element from every row and column of A. 

From the expansion of | A | it can be easily seen that cofac¬ 
tor of a, ; - in a matrix A = (a j; ) is the determinant obtained by dele¬ 
ting the i th row and i th column of A, multiplied by ( —1) <+; . The 
determinant obtained from A by deleting the i th row smdj th 
column of A is called the minor of a z j in A. 


i.«., Cofactor of «,-=(■—1) ,+? ' minor of a 

1.63. Simple Rules for Evaluation of Determinants. 

Some simple rules for evaluation of determinants of order 2 and 3 
are suggested here. Determinants of higher orders may be evalua¬ 
ted by using the theorems 1.6.1 to 1.6.7. A determinant of order 
2 may be evaluated as 



a 

c 


b 


= ad—bc 


This may be easily verified by spliting the rows oci ~(a, b) and 
a 2 =(c, d) in terms of unit vectors and expanding | A | . 

A determinant of order 3 may be evaluated as follows : 


| A | =! cii a 2 a z 


bi 

&2 &3 

= «1 

h 

b 3 

— a 2 

h 

b 3 

+ a 3 

bi 

b 2 

Cl 

0 2 C3 


C 2 

63 


Cl 

C 3 


Cl 

c 2 


The determinants of the various submatrices 


&2 ^3 

> 

61 63 

i 

bi b 2 

^ C2 C3 ^ 


c ! c 3 


C 1 c 2 


are the minors of a ly a 2 and a z respectively. 

Equation (3) is the expansion of | A | in terms of the ele¬ 
ments in the first row and their cofactors. Similarly | A | may be 
expanded in terms of the elements and their cofactors of any other 
row (column). 

| A | = di[b 2 c z — b z c 2 )— d 2 {b\C z —^3^1)— b 2 Cx). 

=d-\b 2 c z -j- a 2 b z Ci d z bic 2 — dib z c 2 — ci 2 bxc z — d z b 2 C\ 

There are six terms in the expansion and further each term 
contains one and only one element from every row and column of 
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A. From the nature of the terms in the expansion of f A | amecha. 
nical way of writing down the terms in the expansion of a third 
order determinant may be suggested as follows : 


Let 


A I = 


CLi Qz 

bi b% 6* 
e, c 


Augment the columns of A with any two consecutive columns 
of A. For example if A is augmented with the first and second 
columns of A then the resulting arrangement is as follows : 


fl’l #2 #3 tfl 0*2 

\ \/ \/ / 

\ /\ t 

Oi 6 a ^3 &1 

V' V-' \ 

Ci C3 Ci Cj 


Then multiply and add elements joined by straight lines. 
From this sum subtract the sum of products of elements joined by 
dotted lines. 

t.e., | A f =fl] b-2 C3-f-Oj 63 bi Cj—03 63 Ci—ai 63 C2— a^ C3. 

Determinants in general may be evaluated by using the 
axioms or theorems given in this section. 


Ex. 1.63.1. Evaluate, the determinant 



Solution: Add (— 2 ) times the first row to the second row 
(the value of | A | is not changed). 


1 

0 

-1 

2 


1 

0 

-1 

2 

2 

-1 

1 

0 


0 

-1 

3 

-4 

3 

0 

2 

-4 


3 

0 

2 

-4 

1 

2 

4 

5 


1 

2 

4 

5 
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Similarly add (—3) times the first row of A to the third row 
and (—1) times the first row of A to the 4th row of A, then 

I A | = 1 0 -i 2 

0-1 3-4 

0 0 5 -10 

0 2 5 3 

Expand | A | in terms of the elements of the first column 
and their cofactors. 

| A | = -1 3 - 4 

0 5 -10 +0+0+0 

2 5 3 

= -13-4 

i • 

0 5 -10 

2 5 3 

And 2 times the first row to the third row, then 
| A | = -1 3 - 4 . 

0 5 -10 
0 11 - 5 

Expand | A | in terms of the elements for the first column and 
their cofactors. 

I A | =( —1) 5 -10 +0+0 

11 - 5 

= - 5 -10 

11 -5 

=—(—25+110)=—85. 

Comments. It is easy to see that 

(1) |I| =1 

(2) | D | =di, d 2 . d n , where D is a diagonal matrix with 

diagonal elements d lt d 2 ,..., d n . 

(3) | T i =fi, t 2 . t n , where T is a triangular matrix with 

the diagonal elements t l3 t 2 , 
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For example 


2 

1 

0 

5 


0 

7 

4 

-4 


0 

0 

3 

0 


0 

0 

0 

4 


=2x7x3x4=168 


( 4 ) 

( 5 ) 


AB | = 
A' | = 


A | 
A | 


B | (A and B are square matrices). 

where A' is the transpose of A 
This follows from symmetry or 
from the definition itself. 


( 6 ) | P | =±1 


If P is an orthogonal matrix. 
Proof. If P is orthogonal P'P=1 

I PP 7 I = ! P I j P' I | P J 2 =1. Hence the result 

Exercises 

1.27. Evaluate the determinant of the square matrix A, where 

A=i 5 6 3 2 

0 10 4 

12 3 4 

6 0 4 0 

1.28. If B = PAP' where P is an orthogonal matrix, show that 

| B | = | A | . 

1.29. When a matrix A is reduced to a diagonal matrix D by elemen¬ 
tary transformations, D is called the canonical form of A. Show that the 
■canonical form of a singular matrix has at least one zero diagonal element. 


1.7. INVERSE OF A MATRIX 

The inverse A -1 of a non-singular matrix A is defined as the 
matrix A- 1 =(a i? ) where (a«) ik the cofactor of the (ji) th element 
of A (that is, a }i ), divided by the determinant of A. 

Af±A V ! 

I A | J 

| A f -i | is the cofactor of in A. 

Ex. 1.7.1. Let 

A= 12 3 
-12 4 
3 0 5 
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n 


Then 

1 

M 

II 

a 11 

a 12 

a 13 . 




1 

a 21 

a 22 

a 23 





a 31 

a 32 


a 33 



where a n = 

1 A n | / 

1 A | 

= 


2 

4 


1 A 






0 

5 

• 

• 




a 12 

= 


2 

3 








0 

5 

• 

1 A 


etc. 


Comments. If A is singular then ( A 
the inverse is not defined for a singular matrix A 


=0. Therefore 


Theorem 1.7.1. A.A- 1 =I=A- 1 A. 


Proof. Let A ={a i} ) 

A _1 = 1 1 Aii I 

I A | | A 12 | 

I Ai n | 


I A2i | ... | A nl | 

I A 2 2 I ... j A n2 | 
• • 

• ■ 

I A 2n | ... | A nn ] 


A.A' 1 = 1 

«n ai 2 ...ai n | 

1 An | 

1 A 2 i | 

... | A nl 

1 A I 


• 

• 

• 

W 

• 

• 

1 1 

&21 

• • • 

• ■ • 


1 A ln | 

I A 2n 

• •• 1 A nn 


• • • 

a nl Q'n2"-Q'nn 






1 

” I A | 

= 1. 

Similarly A -1 A=I 


| A | 0 0...0“ 

0 | A | 0...0 


(Theorem 1-6-6). 



Comments. It may be easily s$en that 

(1) If D=diag (di, d it ...d n ), d^O for i= 1, 2, ...to then 

D _1 = diag (di -1 , d 2 " 1 J ...d n - 1 ) 

(2) (AB) _1 =B _1 . A -1 . 

(3) If P is orthogonal P'=p-i. 

(4) 1 1= =I, (5) If PAP =D then ! A | = | D | where P is an 
orthogonal matrix and D is a diagonal matrix. 

1*71. Solutions of Linear Equations. In a system of 
linear equations AX =Y if A is a square matrix of order to and rank 
to ( i.e ., the system is a non-singular system of linear equations) 
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then X'ssA'^Y' (obtained by pre-multiplying AX'=Y' by 
It can be proved that a system of linear equations AX'=Y' • 
consistent (or has a solution, where A and Y are known and X j* 
unknown), if the ranks of A and C, where C is a matrix obtained 
by augmenting A with Y', are equal. If the number of independent 
equations is less than the number 4 of unknowns, then the equations 
have an infinite number of solutions. For a system AX'=0 if the 
rank of A is r then the total number of independent solutions i s 
n-r. If r=n then there is only a trivial solution X=0. The 
general solution of a non-homogeneous system of linear equations 
AX'=Y' where A and Y are known, is obtained by adding a parti, 
cular solution of AX'=Y' to the general solution of the corresi 
ponding homogeneous system AX'=0. 

Ex. 1.71.1. Solve 2x-\-3y=5 

Sol: Let y =0 

then a;= JL 

For various values of y various values of x are obtained. 

This system has an infinite number of solutions. 

Ex. 1.71.2. Solve 2z~\-3y=5 

x+y=2 

cat; h:] 

or AX'=Y'. 


Sol : Here A is non-singular [since A has rank 2, or the vec 
tors (2, 3) and (1, 1) are independent]. Therefore 


* l=A-iY' 
y J 


-c ‘rm 


or 

and 


=[,‘] 
«=1 
y= 1, 

The solution is unique. 


Since j A | =—1 
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For a simple system like, two equations in two unknowns, the 
classical method of elimination of variables is easier than inverting 
a matrix. But the above method is a general method which can 
be applied when the system of equations involves a number of 
equations and a number of unknowns. 


1*72. Quadratic and Bilinear Forms. A few definitions 

and some elementary properties of quadratic and bilinear forms 
are given in this section. 

A quadratic form XAX'is said to be positive 
definite' if XA° ne ^ ati ^e definite, negative scmi- 

resDectivpl ^^1 >0 ’> 0 ’ <0 ’ < ‘ for an - v rea l non-null vector X, 
i ^ X is a row vector, X' is its transpose and A 

ymmetnc matrix. That is, for any non-null real vector X, if, 

>0 then XAX' is positive definite, 

XAX >0 „ positive semi-definite, 

XAT'” negative definite, and 
XAX'<0 „ negative semi-definite. 

m atrir th ° Ut l03S ° f geaeralit y A is assumed to be a symmetric 


XAX'— T ” 1 V n 

;=i i=i a a • Since a^a^a^a:,, if 

fitted h :ZTb K A l iS 2 W a Sy “yf iC) d T “ B can be 

in A is not a ..* 


Jbxercises 

order n!' M ‘ Sh0W if A “ ni1 B are “““••Nialar square matrices ot 

(a) (AB)" 1 =B _1 A _1 , 

(6) (A" 1 )" 1 = A. 

1*31. Show that | A-i | =_L if | A I *0. 

o ^’^2. The trace of a square matrix A is dfifinsri +i 

leading diagonal elements. Show that tfr(AB)=$r(BA). 16 SUm ° f the 

1 33. If p is an orthogonal matrix show that £rPAP'=£rA. 

-an idemp^enfnfato A “ Ca " ed a " idera P oteat Show that for 

{a) JrA=rank of A; 

(b) the only non-singular idempotent matrix is I. 

1-35. If A is idempotent and if A+B = I, show that 

(а) B is idempotent, 

(б) AB = BA=0. 
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1-36. If X is an wx 1 vector with elements . . x n and if A i s an 

nxl vector with elements a lt . a n an d if Y—X A then the derivative of 

Y with respect to X, denoted by 0Y/0X jg defined as. 


0Y/0X = 


aY id*i ~ 

aY/a®, 


L a y /3*» J 

then «how that 


(1) 0Y/0X=A, 

(2) generalize this to the case where A is a matrix ; 

(3) 0Y/0X=2X'A where Y=X'AX and A is a symmetric matrix ; 

(4) 0Y/0A=2XX'—D(XX') where Y=X'AX is a quadratic form and 
D(XX') is a diagonal matrix whose diagonal elements are the diagonal ele¬ 
ments of XX 7 and 0Y/#A is defined as the matrix with (ij )th element 
3 Y/0af f y. (Assume symmetry for A). 

l - 37. Let Y= 0 A'-fe where Y and e are lxn vectors, 0 is a 1 xp vector 
and A' is a pxn matrix of rank p. Show that the value 0 of 0 for which 

ee' = ye® is minimized with respect to 0, is given by 0==YA(A'A) _1 . 
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CHAPTER 2 


PROBABILITY 


2-0. Introduction. In day to day life we make statements 
like, it is very likely that Kumary Latha will win the coming 
music competition, tomorrow will probably be a sunny day, the 
chances are almost nil that a man will live for ever, drug X may 
be more effective than drug Y in curing disease Z etc. In all these 
statements there is an element of lack of certainty. Statistics and 
especially the theory of Probability have a vital role in making 
decisions in the face of lack of certainty. Probability theory is 
mainly developed to describe and in some sense measure lack of 
certainties in situations where there is an element of uncertainty. 
There are three basic problems in the theory, namely, 

(1) to describe the situation clearly or to specify a set on 
which probability statements are made. 

(2) to define a numerical measure for a probability statement 

and 


(3) to evaluate numerically the probabilities in specific 
situations or for particular events. 

Even though the palmists, astrologers and fortune-tellers of 
ancient India might have used a record of past events to predict 
the future, the recorded evidence of a systematic study of the 
present day probability theory is that, it developed as a theory of 
games of chance in the 17th century when some ardent gamblers 
consulted mathematicians about dividing the stake money in cases 
of incomplete games. 


2*10. Number of Vectors and Subsets from a given 
Set. Eventhough this section is not very important in defining 
probability, it is useful in solving some problems of a probabilistic 
nature In chapter 1 we were considering sets, subsets special 
types of sets, vectors, matrices etc. We did not consider the 
number of possible subsets and possible vectors from a given set 
Consider the word PET. Let us consider all possible three letter 
words that can be made from the letters of this word Thp wnrrl 
are PET, PTE, EPT, ETP, TEP and TPE. There are six diffZ^ 
words possible. If the order in which the letters occur is not im 

Ttl S ° nly T ° ne arran genient, because all the words" 
tiave the same letters. Let us take the word CUTE and 

all two letter words possible from this word. ThY words are gi.tn 
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below. There are 12 words. If the order is nob important ther 
are only six arrangements. 


PET 

CUTE 

3 letter words 

2 letter words 

PET 

CU 

UT 

PTE 

UC 

TU 

EPT 

CT 

UE 

ETP 

TC 

EU 

TEP 

CE 

TE 

TPE 

EC 

ET 


6 12 


The process of forming 3 letter words from the letters P, E, 
T, may be considered to be a problem of filling three boxes with 
three different objects. After having filled the first box the second 
may be filled with one of the remaining two objects. Similarly the 
third box may be filled up by the remaining object. There are 
altogether 3 x 2 x 1 = 6 ways. This problem may also be considered 
to be a problem of determining the ordered sets of distinct ele¬ 
ments from a set of distinct elements P, E and T. 

211. Number of Vectors from a given Set. The total 
number of possible vectors of order r with distinct elements from 
a set of n distinct elements may be defined as the number of 
permutations of n objects taken r at a time ; or it may also be 
defined as the number of ways in which r distinct objects may be 
selected out of n different objects, taking into consideration the 
order in which the elements are taken. This number is called the 
number of permutations of n objects taking r at a time. The 
procedure of arranging or selecting the objects in some order is 
called a permutation. The number of permutations is usually 
denoted by ( n)r . Other notations are nP r , P(w, r) etc. 

Theorem 2*11*1. (n)r=n(n — 1){» — — r-j-1) 

Proof. Let the given set be S =(a 1} a 2 , a n ) where all the 
a’s are distinct. Let V = (6 1} b 2) ... s b n ) where b,- E S for all i and no 
two of the b’s are the same or b lt b 2 ,...b r are some r of the a’s from 
S. The total number of V’s may be determined as follows. One 
of a 1} a 2 ,...a n may be taken as &i. After having selected bi there 
are (n— 1), a's left. Out of these one may be selected for b 2 . 
There are (%—1) ways of doing this since there are (n— 1) a’s left, 
etc. Therefore the total number of V’s 

t.e., {n)r=n(n—\)(n — 2)...(n-r -f 1) 

_ n(n— 1 )(n — 2). ..{n — r-\-\)(n — r)... 2.1 

(n — r)... 2.1 
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§ %/ • _ 

= ,--i-r- where n ! in factorial or factorial n 

[n - r) ! ' 

= 1.2.3...w=w(to—1)...2.1.) 

Ex. 2111. Let S = {1, 0, -1, 5}. 

Find out the total number of permutations taking 3 at a time. 


Solution. Here n= 4 and r =3 and according to the formula 
(?i)r=(4)3 = 4x 3 x 2 = 24 = 4 !/l ! 


The vectors are given below : 

Vi = (l, 0,-1) V 7 —(1, 0, 5) V 13 = (l, -1,5) 
V 2 =(l, -1,0) V a =(1,5,0) V 14 = (l, 5, -1) 
V 3 =(0, 1, -1) V, =(0,1,5) Vi 5 = ( — 1 , 1, 5) 
V 4 = (0, -1, 1) V 10 = (0, 5, 1) V 16 = (-l, 5, 1) 
V 5 = (-l, 1, 0) V n =(5, 1, 0) V 17 =(5, 1, —1) 
V 6 =(-1,0, 1) V 12 =(5, 0, 1) V 18 = (5,-1, 1) 


V 19 =(0, -1, 5) 
V 20 =(0, 5, -1) 
V 21 = (-1,0, 5) 
V 22 = (-l, 5, 0) 
V 23 = (5, 0, -1) 
V 2l =(5, -1, 0) 


Comments. If r=4, that is, if we consider all possible 
vectors of order 4 then it is easily seen to be 

4 ! =4x3x2x1 = 24. 


In general {n)n= 


n ! n ! 

( 7 i—n) ! 0 ! 


-=n ! (0 !=1 is assumed) 


v 


Ex. 2’H'2. Find the number of distinct four letter words 
that can be made using all the letters of the word “doll”. 

Solution. If the two l's are distinct like l x , l t then the 
total number of words is evidently 4! =24. The two l'a may be 
arranged in 2 ! ways. Since l x =l 2 =l the total number of distinct 
4 l 

words =- 2 -j-= 12 . 


Comments. If the elements of a set are not distinct, in 
some sense (n)r can be defined by a modification seen from this 
example. The-number of distinct vectors can always be deter¬ 
mined even if in the original set some elements are repeated. 

Ex. 2*11*3. Find the number of distinct words that can be 
made using all the letters of the word “Mississippi ”. 

Solution. If the four Vs in the word are denoted by i ls i 2 
is and h, the four s' s by s x , s 2 , s 3 and s 4 and the 2p’s by p x and vl 
then the total number of words=ll ! The 4 Vs can be arranged 
in 4 ! ways, the four s’s in 4 ! ways, the 2p’s in 2 ! ways and the 

one 777 ml! way. Therefore the total number of distinct words 

18 


11 ! 

4 ! 4 ! 2 ! 1 


, = 34650. 
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2*12. The number of Subsets from a given Set. t 

permutations the order of the elements of a set is taken into con* 
sideration. If the order is not considered the number of ways ^ 
which a set of two letters can be taken from a set S={a, b, c\ 0 f 
three letters is evidently 3. The different subsets are (a, b) 
{6, c}, {c, a}. Here [a, b } and { b, a } represent the same subset] 
The number of ways in which subsets of order r can be taken 
from a set of order n is called the number of combinations of n 
taken r at a time and is usually denoted by 



n C r , nCr, C (n, r ) etc. 


Theorem 2*12.1. 


n 

r 


\ ( n)r __ 

J ~ r ! t ! 


n ! 


(n—r) ! 

Proof. A total number of r ! vectors or ordered sets may 
be formed from a set of order r. Therefore the total number of 
subsets, without considering the order 

(n)r n ! 

v ! 


Theorem 

tive integer 

Proof - ( u r ) 

when r= 


r ! {n - r) ! 

212 - 2 . ( 1 ) ( ” )=», ( 2 ) 


n 

n 


) =i - 


= 1, for n a posi- 


n ! 


r ! (n — r) ! 

1 ( n \ n ! 

r J = 1 ! [n- 


n ! 


when 


r—n, 


1) ! ( to -1) 

n(n — 1).2T __ 

(n— 1).21 

n ! n ! 


n. 


n ' —i 


Theorem 

and r=0, 1, 2,.. 

Proof, 


r ) n ! (n — n) ! n ! 0 ! n ! 

2*12*3. ^ U = n r ^ for n a positive integer 


n 


( 


'. If r is replaced by n — r, ^ ^ jbecomes 


n 

n — r 


n ! 


n ! 


(n—r) ! (n—n-\-r) ! (n—r) ! r ! 


n ! 


r ! (n—r) ! 
For example 

4 


-(:) 
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(»)'( 4-3 M 1 ) =4 

(s )-*■ 

This shows that there is symmetry among the various combi¬ 
nations. The number of combinations of n taking one at a time 

is the same as the number of combinations of n taking in-1 ) at a 
time etc. ' 


Theorem 2 , 12*4. ^ ^ j = M for a positive 

integer n and for r= 0, 1, 2,. (n— 1)/ 


Proof. 



(n-1) ! _ (n-1) ! 

r ! (n—1 — r) ! ~ r (r—1 ) ! (n— 1— r) ! 

(ft-3) 1 _ 

(r — I) ! (n— 1 —r-fl) ! 


n—1 
r 


(n—1 ) ! 


(r—1) ! (n-l-r-{-L)[(n-l-r) !] 


+ 


() 


(71 — 1) ! 


_ri_ + i -i 

n — 1 — r) ! _ r n—1 -f 1 — r 


(r — 1) ! (n-1- 

n.[n — 1) ! 


r(r — 1) ! (n — r)(n—1 — r) ! 


n 


! _ =( n \ 

-r) ! \ r ] 


r ! (n — 

Ex. 2.12.1. In how many ways can a set of 4 air hostesses 
be selected from a band of 20 beautiful girls ? 


Solution. Here the order in which the air hostesses are 
selected is not important. Therefore this is only a case of combi¬ 
nation of 20 taking 4 at a time. The answer is 



20.91.18.17 
1 . 2 . 3.4 


= 4845. 


Ex. 2.12.2. In a set of 100 birds 50 weigh more than 5 lbs 
and the rest iveigh less than 5 lbs. In how many ways a sample of 
10 be selected so that the sample contains 6 birds weighing more than 
o lbs. 


Solution. Here the set of birds weighing more than 5 lbs 
contains 70 elements. In the sample 6 birds should weigh more 
than 5 lbs and 4 birds should weigh less than 5 lbs. Selection of 
0 birds weighing more than 5 lbs, amounts to the selection of a 
subset of size 6 from a set of size 70. This can be done in 
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^ 70 j wayg Similarly 4 birds from 30 birds can be selected i n 

/ 3° j vyaySi Therefore the total number of ways in which a 

sample of 10 containing 6 birds weighing more than 5 lbs and 4 
birds weighing less than 5 lbs can be taken is 


( 6 ° ) ( 4 ° ) WayS - 

Ex. 2.12.3. A deck of 52 playing cards contains 4 different 
suits of 13 cards each. In how many ways can a hand of 8 cards be 
selected so that there are 4 clubs, 3 hearts, one diamond and no 
spades ? 

Solution. By reasoning similar to that of the previous ex- 
ample the answer is ( ^ 1^ )( 0^ ^ =2 658370. 

Ex. 2.12.4. In how many different ways can 4 people be 
stated [a) in a row , (6) in a circle ? 

Solution, (a) If the people are A, B, C, and D and if they 
are arranged in a row the first position can be taken by A or B or 
C or D. The first position can be filled up in 4 ways. Similarly 
the second position in 3 ways, the third in 2 ways and the 4th in 

one way. So the total number of arrangements is 4x3x2x1=41 
=24. 

When arranged in a circle all the permutations ABCD, 
BuDA, CDAB, DABC give the same arrangement. Therefore the 
number of different seatings= ± ! = 3 ! 

and that “)*?"“ ‘ Pe ° P ’ e f ” <“> is 4 ! 

ifnt nCt ° f ** r 6e 

Solution. This is only a case of the number of combina¬ 
tions of n things taken r at a time. Hence the answer is ^ n j. 

2.13. Stirling’s Formula, According to the definition 

wiien n= 5 n ! = 120 

n = 8 n 1 = 40320 

» = 10 n ! =- 3.328800 

ence o'/comnutaS? Iarge ’ ” • becomes Ter y ] arge. For conveni- 
formula P “ a I>P ro *™ation for n ! is given by Stirling’s 

U ! ^ (2tt) 1/2 72,w-M/ 2 g-n 


• •.(2.13.1) 
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usual*meaning?? 1 * a P.P rolimatel y equal to, * and e have the 

mately equaUo 2 71SM a PP r0I ™ atel y=3.141fi and e is approxi- 

• 828 where t is the base of natural logarithms). 

A closer approximation for n ! is 

n ! ^ (27r) 1 /2 n n+ii2 g-n+i/ian ^2 13 2 ) 

Ex. 2.13.1. Using Stirling's approximation prom that 

^ 2 n ^ 2 2n 


Proof. 


U !^(2tt) 1/2 ?j,”+l'2g-n 

( 2n ")= ( 2w ) ! = ( 2 ») ! 

\ n ) n ! ( 2 n—n) ! ~(^) 2 ~ 

(2 n) !sa(2u) 1/2 (2w) 2, »+l/2 g~2n 
(n !) 2 an [^tt) 1 / 8 »"+i/ 2 e -nja_(2 tc)»*»+i e~ Zn 
( 2 ?i) ! ( 2 Tc) 1 / 2 ( 2 w) 2 n +iy 2 e - 2 » 

!) 2 ~ (2Tc)7fc 2 n+ 1 e- 2 » 

22n+l/2 
( 2 tu) 1 ^ 2 ? 1 1 / 2 
2 2 « 


(7m) 1 / 2 

* Ex * 7 / r (*) as gamma x) is defined as (x—l\ t 

for a positive integral value of x, show that J 

r(x)^( 2 ny/ 2 e-Vi '2 when x is a positive integer. 
Proof. n t ^ ( 271) 1 /2»«4 1 /2 e - n 

T(x) =(*— 1) ! ^ (27T) 1 / 2 (a; — 1 )(*-l)-r l /2 c -(x-l) 

= (2iz) 1,2 (x~ 

= (2,r)V'**-»/» ( 1- ( l_iV l \ 

When x is sufficiently large i- -» 0 (tends to zero) and 

%C ' 


thereby ^ 1 — — ^ ~> 1 ;^ 1 —— j -»e 1 , since ^ _ >e 3/ 

when x oo, for a finite y. 

I»=(a- 1 ) \ifa(2Tz) l l 2 .x x ~ l l 2 e~ x . 

Comments. In this example x is assumed to be a positive 
integer. But I» is defined for non-integral values of x also. A 
reader who is not familiar with limits and gamma functions is 
advised to read any elementary book on Calculus for 'limits’ and 
J.M.H. Olmsted, Calculus with Analytical Geometry vol. 1 & 2 
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Appleton-Century-Crofb Publishers, New York for Gamm a 
functions. 

2'14. The Binomial Coefficients. 

Consider the following equations : 

(P1-I-P2) 0 —1 

(P 2 +P 2 ) l =P! + P a =( Q ) P a P , + ( j ) P 2 P a 
(P 1 + P 2 )S=P 1 +2PXP.+ p 2 =( “ p 2 p x 


+(!)*: * ♦(.>'* 


(P 1+ P a )3= p’ + 3P* P s +3P 2 Px+p’ ® 


0 3 

P P 
r 2, X 1 


+(?)*;>!+(*)>; p; + u)r: r ; 

In general it may be easily shown that, for any positive 
integer n , 

<p 1+ p 2 )-.=( ”) p“ p"+(” )p a l p"' +. 

[ n \ r »-r f n \ n • 

+ ( r)*> + + (IjP 2 P, ' 

These types of expansions are called Binomial (of two) 
expansions and the coefficients of P P , P p ... P P 

are seen to be ^ Q ( j )’ ( 2 )’. ( ” ) respectively. These 

coefficients are called Binomial coefficients. These binomial 
coe cien s ^ay also be constructed from a special arrangement of 

° a 1 , e< ^ .^ e Pascal triangle. The pattern of numbers 
called Pascal s triangle is given below. 


1 . 2.1 

• 3.3.1 

4.6 . 4 . 


etc. 
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tln s arran gement each row starts with one and ends with 
one. lhe first row has one element, the second row has two 
elements etc. . Any number in any row is obtained by adding up 
wo consecutive numbers in tbe preceding row and writing the 
sum m the next row mid-way between the numbers added up. In 
this arrangement the second row elements are the coefficients in 
the expansion of the third row elements are the coeffi¬ 

cients in the expansion of (P 1 -I-P 2) 2 etc. It can be proved by the 
method of induction that 


(l+ a O n = l-|- ( i )a;+- 


n 

2 


x 2 -f.-f 


n 


* 

\n>r 


r j 


x r 4- 


for n a positive integer, negative integer, positive fraction or a 
negative fraction, where ^ ^ | x j <1 

When n is a finite positive integer the expansion is valid for any 
finite x. 


Ex. 2.14.1. Find the total number of samples of sizes 0, 1, 2, 
. . n ™at cun be taken from a population of size n. 

Solution. A sample of size r from a population of size n 
can be selected in r *j ways. We are asked to find out 


( n n \*( n \, fn\ 5 {n \ 

V 0 j + i 1 M 2 ) + •" + ( n )“ r l 0 ( >• ) • 
Consider the expansion 

<*+*■>-U) P « p : +(?>; *r+-+(:)* 


Put Pi P 2 — i then 


P. 


< , +»-(o)+{r)e~+(n+- + (;) 

Hence the answer is 2 n . 


Ex. 2.14.2. Show that 71 


= s[ m V 71 \ 

7c=0\ & A r—k) 


Solution. 

But 


(l+a) m +" = (l +®)»+( 1 + a .)n 
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and (!+*)»= ( o ) + ( l ) *+ ( 2 )*'+ 

Comparing the coefficients of x r on both sides 



2*15. The Multinomial Coefficients. The binomial 

( 't'b \ Tb ^ • 

) — -r-r- (when n is a positive integer) may 

r J r ! (n —r )! v 

( yi \ i n ! 

) = —— r where ri-\-r 2 =n, or —j- r may 

r J «t ! f 2 ! ?T ! r a ! 

r i »* . 

be considered to be the coefficient of Pi P 2 in the expansion of 
(Pi-f P 2 )" where n is a positive integer, and the whole expansion 
may be written as 

n n n ! r v r 2 

E E —P, P 2 where the particular 2 

r 1= 0 r 2 = 0 r l ‘ f 2 ■ 1 

r x -\-r z =n notation means that the summation 

is subject to the condition r x -\-r z =n. 

This result can be easily generalized to a multinomial expan. 
sion for an expression (P 1 +P 2 +.-fP/ c ) n . In this expansion the 


coefficient of Pi * P 2 . P 7( . , where r x + r 2 4-. +’>’k= n > 

is called a multinomial coefficient. The expansion may be given as 


n 

2 

r 1 = 0 


n 

E.. 

r 2 = 0 


n 

r 7 ,= 0 


n ! 


n ! r % !.f fc ! 

fi-f ra +.+ »■»=». 

Analogous to the binomial coefficient 

n ! 


n 


7 *. 

Pi Pa.Pfc ♦ 


the multi¬ 


nomial coefficient is 


?i ! r 2 ! 






?'i ! r 2 ! 

r where r x -f ?’ 2 -f.-}- r k = ^ 


and w is a positive integer. This may also be considered to be the 

number of ways of getting r x , Pi's, r 2> Pz's . r k , P k s, such that 

r x -\-r 2 -t-.-j -r k —n. Another notation for the multinomial co¬ 
efficient — T -^- is ( U \ 

\ n, r z , . ,r k ) 


r x ! r% ! 


Tk 


It is easily seen that 
n ! 


/ n\t n—r x \( n—r x —r z \ ( r lc \ 
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that is, the multinomial coefficient may also be considered to be 
the number of ways in which a set of n elements can be arranged 
into an ordered set of lc subsets having rq, r 2> . i'ic elements res¬ 

pectively, where r± -f r 2 -f ~{-r k —n. It is also seen to be the 

number of permutations of n objects taking n at a time, in which 

are of one type, r 2 of a second type, . r lt are of a k ta type, 

where rH • r 2 +■.+ r k n . 

Ex. 2.15.1. A coin is tossed 10 times. What is the total 
possible number of ways in which 4 heads and 6 tails can come up ? 

Solution. This can be obtained from a binomial expansion 
of the form (Pi-|-P 2 ) 10 . The various coefficients in the expansion 
give the various possibilities of zero P 2 and 10 Pi, one P 2 and^9 P^, 

., ten P 2 and zero Pi. Therefore the coefficient of P x P 2 

gives the required number of ways in which 4 heads and 6 tails 
can come up and is 

-( ■ 

10 ! 10 . 9 . 8 . 7 

— 4 ! (> ! — 1 . 2 . 3 . 4 
= 210 . 


Ex. 2.15.2. A die is rolled 12 times. In hoio many ivays can 
one yet two Vs, two 2's, one 3, four 4’s, two 5’s and one 6 ? 

Solution. This is similar to the previous example and the 

required number is the coefficient of P x P 2 p 3 p^ p , g ^ j n 

the expansion of (pi -\-p 2 -f-pj \-p\ -\-Po-\-Pg) 12 - The answer is, 

12 ! _ / 12 \ 

2 ! 2 ! 1 ! 4 I 2 ! I ! 2, 2, 1, 4, 2, 1 ) 

= 2494800. 


Exercises 

from the\*et^' ind the numbers of ve °tors and subsets that can be obtained 

[a) A={0, 1, 2, 3}, 

(b) B = {-1, 5, 0}. . 

With soft o/h^^crtTsts^an^with sujgar ^ntent^l^^oF'o^s^ 11 

distinct types of pies can she make ? & 09 0 T 6 ' How raan y 

2*3, A four letter word is to be made with r 

woMtanotAT mMy di8tin °* W ° rdS ° an b6 m ” de 

2-4. In how many different ways can 4 people be seated 
(l) in a row. 
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(2) in a circle, 

(3) in a row of 6 seats ? 

2-5 How many distinct words can be made using all the letters of the 

word panamanian ? 

2-6 A student’s council has 5 members. How many different ways 
can both a programme committee and a reception committee, each with two 

members, be made, if 

(a) nobody is allowed to sit on both the committees, 

(b) with no restriction ? 


2-7. In how many ways can 

(1) an athltic team of 5 be selected from a set of 30 sportsmen, 

(2) a team of 5 be selected with Shri Kumar of the set in the 
selected team ? 


2-8 A secretary puts letters addressed to 4 people into the envelopes- 
which are already addressed to the 4 people. In how many ways can she put 
the letters in the envelopes, so thar, nobody receives the letter addressed to 
him ? (This problem is often called the matching problem.). Prove thalv 
' the general case when there are k envelopes and k letters the answer is 


in 


3 ! 


+ 


k ! 


2-9. If the outcomes head and tail, when a coin is tossed once, are 
denoted by H and T, we may construct a diagram which may be called a 
tree diagram to compute the number of possible outcomes when a coin is 
tossed a number of times. This diagrammatic representation is helpful in many 
pioblems. Such a diagram for the experiment of throwing a coin twice is- 

given below. 



Using a tree diagram determine the total number of outcomes when a 
die is rolled three times. 

2*10. In how many ways can a set of 13 cards he selected from 52 cards 
such that it contains 4 hearts, 5 clubs, 4 diamonds and a spade ? 

2 11. Ku. Ragini knows 5 different Bharat natya (an Indian classical 
dance). In how many different ways can she give the performance for two 
occasions if 

(1) she does two dances for each performance and different dances 
for different performances, 

(2) two dances for each performance but she starts with the same 
dance for every performance ? 

2 - 12. In a class there are 100 students out of which 20 are highly intelli¬ 
gent. In how many ways a committee of 20 be formed so that the committee- 
contains 4 highly intelligent ones ? 
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2 13. In how many ways can 6 halwa bars (an Indian sweet prepara 
tion) be distributed among 4 children so that 

(1) any child can receive any number of halwa bars, 

(2) there should be at least one halwa bar for every child ? 

Show that the general formula for the number of ways in which r 

indistinguishable articles can be assigned into n cells is (This 

is kn«wn as the occupancy problem). 

2-14. Show that 

n / n \ „ 

«■> *r_. (- 1 )' ( r J“° ! 

<*> A_. { r ) =( „ ) > 

(°) (-if (” )(,*_,)■=(—i)* (r) &r ; 

( d) Evaluate ^ and ’ 

Cn 2 )- 

2-15. By using Stirling’s formula if necessary, 

(а) Show that n ! = (2tt) 1 / 2 (n-(-l/2) w + 1 / 2 e'( M + 1 /*), 

(б) approximate 15 ! and find the percentage error in the approxi¬ 
mation. 

2-16. In how many ways can G rolls of a die yield 2 ones, 3 twos,, 
one 3 aad no 4’s, 5’s and 6’s ? 

2.2. ALGEBRA OF SETS 

Mathematical operations analogous to the simple mathemati¬ 
cal operations of addition, multiplication etc. on numbers, can be 
defined on sets. 

2.21. Union of Sets. Union or logical sum of two sets A 
and B, which is written as A(jB (A union B), is defined as the 
set of elements which belong to at least one of the sets A and B or 
the set of elements which belong to A or B or both. Other nota- 
tions are A-f B, A-f-B etc. 

Ex. 2.21.1. Let A^{1, 2, 3, 4} ; B={1 , —1 ) 0, 4} 

Here 1, 2, 3, 4 belong to A, 1, -1, 0, 4 belong to B and fur¬ 
ther 1 and 4 belong to both A and B. Therefore 

A\jB={ 1, 2 y 3, 4, -1, 0} 

Comments. The numbers 1, 2, 3, 4, —1, 0 belong to atleast 
one of the sets A and B. Here the order in which the elements 
are written and the magnitudes of the elements have no import- 

if A = and E =w theu AuB={ 5, 4} 
nn +■ Tile °P erator ‘union of two sets’ is a binary 

that ls an operation connecting two mathematical 
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Ex. 2.21.2. A={o, Ku. Sarojam, Mr. Iceberg , 6 } ; B= {4 3 

— 1 Tajmahal, Mr. Fox, Shri. Ajaya} then A{j B = {5, Ku. Sarojam 
Mr’. Fox, Mr. Iceberg, 0, 4, 3, -1, Tajmahal. Shri. Ajaya}. 

Comments. The elements of a set need not be always 
numbers in order to define the operation 'union’ on them. 

Ex. 2.21.3. Let A and B be the spaces enclosed, by the closed 
curves a and (3 in Fig. 2.1 respectively then A[jB is the shaded 
region. 

Comments. Representation of sets by diagrams as shown 
in Fig. 2.1 is called representation by Venn diagrams. This will 

enable the reader to grasp quickly 
the ideas of sets and union of sets 
etc. In a Venn diagrammatic re¬ 
presentation a set is represented 
by the space enclosed by a closed 
curve thereby the elements of the 
set are symbolically represented as 
points in this space. This does 
not mean that the elements of a 
set can always be represented as 
geometrical points in a two di¬ 
mensional space. The represen¬ 
tation is purely symbolic. For 
example A and B can be the sets 
A and B in Ex. 2.21.2. In 
this case the space a (alpha) has 4 points and [3 (beta) has 6 points 
or G elements. 

Ex. 2.21.4. A =the set of students who take a particular 
mathematics course in a particular university at a particular time. 
B= the set of students who take another mathematics course in that 
university at that time. Then A{jB is the set of students who take 
atleast one of the courses under consideration. 

Comments. If all the students in one course are also 
students in the second course and if no other students take the 
second course then A and B are the same, and hence evidently 
A(jB = A =B. We can easily prove the general result that A(j A 
= A where A is any set. 

Ex. 2.21.5. A = the set of rose flowers in Brinclawan garden at 
a particular time. B—the set of pink rose flowers inBrindawanat 
that time. Then A{jB is the set of rose flowers in Brindawan whici 
are pink or not. 

Comments. Here evidently Be A and hence A(jB = A. If 
there are no pink roses then B = </> and therefore A\j<j)=A. These 
results hold good for any set A and for any subset B of A. 

The above definition of union of two sets may be extended 



Fig. 2.1. 



PROBABILITY 

49 

to a collection offsets. If Ax ..., A n is a collection of n sets then the 

n 

union of Ax,..., A„ (that is, Ai(j A 2 U ••• U A n = (j A,) is that set of 

i , . «=1 

elements which belong to atleast one of the sets Ax, ...,A„. 

2.21.6. Let A X ={ 1 . 2, 3} : A z ={0 -1 2\ ■ A. 

= {7 0 1/2}; A 4 = {d,7 } then 4xU^uVu^={0. i. 0 t - 1 , ' 


Comments. From the above examples it is evident that 
for any sets A, B, C, (l) A(jB = BijA (that is, the binary opera- 
tion ‘union’ is commutative. An operation P connecting two 
mathematical objects a and b is said to be commutative if «P6 
= oPa. For example the simple ‘addition’ is commutative whereas 
‘subtraction’ is not commutative since a-\-b=b-\-a but a-b^b-a 
in general) (2) AyBuC = BuCuA = C(jB(jA, (3) A(j(-B(jC) 

■!* i ^ the operation ‘union’ is associative or in other 

words it does not make any difference whether B(jC is done first 
or AjjB is done first. The simple operation‘subtraction’is not 
associative since a—{b — c)^A{a—b) — c where a, b, c are real num- 


.. Consider an experiment of throwing a coin three 

times. Let the occurrence of a head be denoted by 1 and that of a tail 
by zeio. Then the outcomes set may be given as. 


f,=?V' 0) T ( ?’A h k { °’ 1 0) ■ h 2) - <*> °’ 0 >’ <*• !)• ^ 

i, l)). L,et A be the event, of obtaining a total of 2, B be the 
event of obtaining a total 1 , and O be the event of getting a total 0 

lit. «■ »»■ *• ”■ 111 *» ; B -« e -«• w '•«: 

.... TsS&SiiX 11 * * ««- * <• ■»' m «- * 


AuBuC={(0, 1.1,), {1,0,1), {1,1,0), 
( 1, 0, 0), {0, 0, 0)} 

gives the event of getting a total of 0 or 1 or 2 . 


{ 0 , 0 , 1 ), { 0 , 1 , 0 ), 


Comments. In this example events A and B or the events 
of getting totals of 1 and 2 can not happen simultaneously. A 
statement of simultaneous occurrence of A and B, in this experi¬ 
ment, has no meaning. So also the events B and C, and also A and 
B and C can not happen simultaneously. 


Ex. 2.21.8. Consider an experiment of rolling one die once. 
The outcome set may be given as the points shown in Fig. 2.2. 

Let A be the event of rolling 1 or 3 and B be the event of 
rolling 4, then AijB is the event of rolling 1 or 3 or 4. 


/ 


- i i i 

3 4 5 

Fig. 2-2. 


2 


6 
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Comments. If two dice are rolled the outcome set may be 
represented as a set of points in a two dimensronal space etc. 

1 22 Intersection of two Sets. The set of elements which 
. ■, t'tr. noth A and B is called the intersection of the sets A and 

B a°nd g S usualfy de" oted by AOB (A intersection B), AB etc d 

Intersection is also called logical product. Intersection' i s 
analogous to the operation ‘multiplication but it is different from 
multiplication. 


then 


Ex. 2.22.1. Let A = {2, 4, -7} 

B={2 , 5} 

A n 


Comments. If A={6} and B = {7} then AnB^{6 x 7} — {42} 
Ex. 2.22.2. A={Ajayan, Ku. Lalitha, 8} 


B—{Miss Cute, Ku. Lalitha } 
then Ar\B={Ku. Lalitha } 

Ex. 2.21.3. A—the set of all students in a class 

B—the set of all female students in the class. 


Then Ar\B=B—the set of all female students in the class. 

Comments. Here Be A and therefore AflB=B. If there 
are no female students then B - <j> and evidently A;1^=f If all 
the students are female students then B =A and therefore AD A 
—A. So we can easily prove the following results that for any 
set A, A(jA=A, Af)A=A J A(j^ = A, AD^ = 0 and if BcA then 
AuB = Aand Af)B=B. 

Ex. 2.21.4. Let A and B he the sets as shown in the Venn 
diagram ; then A nB is the shaded region. 



Comments. From the above examples it is evident that 
ADB = BDA. The same definition of intersection may be exten¬ 
ded to a number of sets. The intersection of the sets Ai, A 2 ,...,A n 

n 


(denoted by AjDA 2 D..-HA„ = D A,) may be defined as the set 

i =1 

of elements which belong to all the sets A lt A 2 ,...A n . 
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2.23. Disjoint Sets and Mutually Exclusive Events. A 

number of sets Ai, A 2 ,....A„ are said to be pairwise disjoint if 
A* f) A,-= </> for i^Lj for alii and j. i.e., AiDA 2 — <f>, A 1 r]A s =(f), 
A 2 D A*=-0 etc. If Ai, A 2 ,...A rt denote disjoint subsets of an outcome 
set then the events Ai, A 2 ,...A n are said to be mutually exclusive. 
i.e., two events A and B are mutually exclusive if Af)B=f Here 
A and B have no element in common or the occurrence of one 
excludes the occurrence of the other. 

Ex. 2.23.1. Suppose that the points marked in Fig. 2-4 
represent an outcome set ; then the two shaded regions represent two 
mutually exclusive events. 



Fig. 2-4. 


S 


Comments. Here the shaded portions are not important, 
but only the points in the shaded parts are important. The two 
parts have no common points and hence the corresponding events 
are mutually exclusive. 


Ex. 2.23.2. Consider an experiment of throwing a coin three 
times. The outcome set is given by 


S={(0, 0, 0), (0, 0, 1), ( 0. 1, 0), ( 0, 1, 1). (1 , 0, 0), a, 0, 1) 

(1, 1. 0), ( 2 , 1, 1)}. 

The event A of getting a total 2 is given by 
A={(0, 1, 1), (1, 0, 1), (1, 1, 0)}. 

The event B of getting 1 is given by 

B={(0, 0,1), (1,0, 0), (0,1,0)} 


AnB = <£ since A and B have no element in common. A and 
B are mutually exclusive or when we get a total 1 we cannot get 
a total 2 at the same time. The two events cannot occur simul¬ 
taneously. 

Ex. 2*23*3. Let the outcome set, S and the events A B C D 
be as shown in Fig. 2-5 then A, B, C, D are mutually exclusive. 

Comments. Here S is partitioned into disjoint sets or this 
is a disjoint partition of the outcome set. Evidently. 

A(jB(jCuD=S, AnB = ^, Anc=<f>, AdT> = 4 >, Bnc=<£ etc. 
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2.24. 


event A. 


, . , «. e t A or non-occurrence of 

Aments d° not belon « to A 


an 


^ . *1 



Fig. 2-5 

called the complement of A and is usually denoted by A. Other 
notations are A', A, A* etc. Therefore the complement of A with 
respect to B is the set of points in B which do not belong to A and 

is usually denoted by B—A. 

Ex. 2.24.1. Let the sets A and S be as shown in Fig. 2 6 
then A is the shaded portion in Fig. 2 6. 



Fig. 2 6. 

Comments. The set of all points in the shaded portion 
gives the non-occurrence of the event A, ifS is an outcome set. 
In this case AcS and also AcS. 

Ex. 2.24.2. Let A, B, S be given as in Fig. 2-7, then the 
various complements with respect to the outcome set S and with 
respect to events A and B are marked in Fig. 2-7 . 




Fig. 2-7. 
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Ex. 2.24.3. Let $={# | 7^33^6} he the outcome set 

A — {x\ l^x^.3} be an event. 

B={x | 2^x^.4} be an event. 


■*■ » i — ■ ■ i i i i 

1 2 3 4 5 6 

' V ' 

B 


Then 


Fig. 2-8. 

A ={® | 3<a:^8} (non-occurrence of the event A) 

B=(x | l^a;<2, or 4<rr^6} (non-occurrence of the 

event B) 


AuB={a: | l^a^4} (event of the occurrence of at- 

least one of the events A and B). 
Ar)B={a: | 2^a;^3} (event of simultaneous occurrence 

of A and B) 

A UA={:c | l^a;^6}=S (event that either A occurs or 

A does not occur. This is 
evidently a sure event). 

Comments. It may be easily verified that A(jS=S, 
■^nS = A, A(j (Af|B)=A. The following results are generally 
true. If A is any event (1) AuA-S, AuS=S, AnS=A. 


^**•2.24.4. Consider an experiment of throwing a coin three 
times. The outcome set S may be given cts, 

S={ i°,’ 0 r’°k [°\°‘. 1) ’ ■ (0,1,1). (1,0,0), (10,1), 

[1, 1 , (J J, 1, 1, 1)} 

Let A be an event of getting a total 2 and B be the event of 
getting a total 1, then 


A ={(0> 1, 1) 3 (1, 0, 1), (1, 1, 0)} 

B = {(0, 0, 1), (0, 1, 0 ), (1, 0 3 0)} 

■ A ~"^ e k °I vectors other than the vectors in A=the event 
of getting a total 0 or 1 or 3, or A is the event of getting a total 
not equal to 2. & ° 

Comments. It can be easily verified that AnB = <£. 

Ex. 2.24.5. Let A, B, C be the events as shown in the Venn 
diagram %n Fig, 2-9, then AnBnC is given by the shaded region. 



Fig. 2-9. 
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Comments. The following results may be easily verified 
from Fig. 2 9. 

(1) An(BuC)=(AnB)u(AnC), 

(2) Au(BnC) = (AuB)D(AuC), 

(3) AuB=lDB, 

(4) AnB=A(jB, 

(5) AuBuC=ADBnC, 

(6) ATYbTYC=A (j B (j C. 

and all these results can be easily generalized to any finite number 
of sets. 

From the above examples a correspondence between sets, 
subsets, etc., to outcome set, events etc., is seen. These may be 
summarized and given as follows. 


Events 

1. Elementary event 

2. Event 

3. Sure event 

4. Impossible event 

5. Non-occurrence of an event 

6. Mutually exclusive events 

7. Occurrence of atleast one 
of the events A and B 

8. Simultaneous occurrence of 
the events A and B 


Outcome sets 

Point belonging to an outcome 
set 

Subset of an outcome set 

Whole of the outcome set 

Null set which is evidently a 
subset of the outcome set 

Complement of a subset of an 
outcome set 

Disjoint subsets of an outcome 
set 

A(jB, where A and B are sub¬ 
sets of an outcome set 

AnB where A and B are sub¬ 
sets of the outcome set 


Exercises 

217. If A and B are two sets such that A(jB={2, 5, 0, -1} and Af|B 
= {2}, find A and B. Are they unique ? 

218. In an experiment of throwing a coin 3 times find (a) three events 
which are mutually exclusive, (6) the complement of the event of getting 
exactly one head. 

219. If S = {0, 1, 2, 3, 4, 5, 6}, A = {0, 1, 2, 3}, B = {3, 4, 5}, find(l)A, 
(2) A(JB, (3) Ap)B, (4) complement of B with respect to A. 

2-3. Functions. The reader may be familiar with point set 
functions or functions of the type y=x"-\-x-\-2, ?/ = sin x, etc. where 
in general we get a curve in a two dimensional space. The 
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relationship between the variables x and y, when represented 
in a graph, is given by a curve as shown in Fig. 2'10. 



Fig. 2-10. 

Corresponding to a point Xi on the X-axis we get y\ 
on the Y-axis. This curve y z =f{x) can be considered to be a corres¬ 
pondence between the points on the X-axis and the points on 
the Y-axis where the correspondence or the rule which defines the 
correspondence is given by y=f(x). We will generalize this notion 
and will define a function or a mapping as a correspondence 
between the elements of two sets. 

Definition. A function or a mapping is a correspondence 
between the elements in the sets X and Y such that corresponding 
to an element in X there is a unique element in Y. This mapping 
can be written as 

/ 

X- > Y 

For convenience we may write the correspondence as y=f(%) 
where x is any element in X and y is the corresponding element in 
Y. By this definition a number of points in X may be mapped 
to the same point in Y but corresponding to any point in X 
there is only one point in Y. 



Fig. 2-11. 
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X is called the domain of the function and Y is called the range 
of the function. 

Ex. 2.3.1* Consider the set X and Y where 

X Y 

~1 2 

2 5 

— 3 10 

6 37 

3 10 

In these two sets , if any element in X is denoted by x and the corres¬ 
ponding element in Y is denoted by y then the correspondence may 
be written as y=x iJ rl. 

Comments. Sometimes it is possible to express the corres¬ 
pondence in known functions [i.e., the form of f(x) is known], like 

y~ 2a; 3 -b3aj-f5, «/ = sin x, y =log aH—- etc. 

00 

In the function y = x 2 -f 1 if x is defined on the set 
R={rc | — oo <£<oo} then the domain of the^function is the real 
line R and the range is evidently the line S = {y | 1^2/<°°}* 
Here both the domain and the range are sets of numbers. 

Ex. 2.3.2. Consider the sets X and Y inhere the x’s and y’s 
represent the height and weight measurements of the students in a 
class. 

X _ Y_ 

xi yi 

%3 2/2 


x n y n 

For example x± is the height of one student and y\ denotes his weight 
etc. 

Comments. In this case we may not be able to find out 
correspondence in a functional form like y=x-\-2 or y = x 2 — 
etc. Here the elements in the sets X and Y are numbers. Accord¬ 
ing to the definition X and Y need not be sets of numbers. The 
elements in X and Y can be other objects or sets etc. Even 
though we denote the mapping by y=f{x) the functional form ol 
f(x) need not be always known. 

Ex. 2.3.3. _X _ Y_ 

Ai={x | I) 1 

A 2 ={x | 7} 3 

A 3 ={x | G^.x^.10} d 

Ai={x | -2<s< 0} 2 
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Comments. Evidently the elements in X are sets 
(intervals) and the elements in Y are numbers (namely the lengths 
of the intervals). 


Ex. 2.3.4. Consider an experiment of tossing a coin three 
times. The elements in the outcome set are given below : 

Fi =(H, H, H), V 2 =(H, H, T), F 3 = (H, T, H), F 4 =(tf, T, T), 
V 5 = (T, T, T), Ve=(T, T, H), V 7 =(T, H, T), V S = (T, H, H). 

where H denotes a head and T denotes a tail . Let X be the set of 
events of getting one head, 2 heads and 3 tails respectively. Let Y 
denote the number of outcomes or the number of elementary events in 
the events mentioned above, then, 


X _ Y 

= {F 4 , F«, F 7 } 3 

A 2 ={V 2 , F 3 , F 8 } 3 

^3 = {F 5 } 1 


Comments. In examples 2.3.3 and 2.3.4 the elements in X 
are sets and those in Y are numbers. The domain of the function 
is a set of sets and the range is a set of real numbers. Such 
functions are called set functions. 


2 31. Set Functions. A function whose domain is a set of 
sets and whose range is a set of real numbers may be called a set 
iunction. In the following discussions we are interested only in 
set lunchons. In example 2-3-3 it is seen that/(AiU A a )=/(Aj) 
+/(A a ). In other words if x is an interval and if y is the length 
ol the interval a;, then evidently the length of the union of two 
lsjoint intervals is the sum of the lengths of the intervals. In 
the same example 

/(A 2 UA 3 )=/(4<a:<10) = 6 

/(A 2 )+/(A 3 ) = 3+4 = 7. 

Here it is easily seen that /(A a uA 3 )=/(A 2 )+/(A 3 )-/(A,n A 3 ) 
/(4<a;<10)=/(4<a;<7)+/(6<a:<10)-/(6<.'r<7) 

= 3-|-4—1 = (l. 


= I ; la ? et /““rtion *f/(AiU A 2 (j ... (j A») 

JnZlhtM' )+ r:+f( A «) when A n A,.. A„ are disjoint in the 

f " «li that the intersection of any two is a null set IA,r\A, = J, 

t / ail 4 an 1 d ^> *^3) ^en the set function is called additive. In 
Rpf e Q Xam P les 2 and 2 ' 3 ‘ 4 the sefc functions are additive. If a 
* ? 18 Petitioned into a countable number of disjoint sets 

i> A 2 ,...and if a set function defined on S is such that 


/(AiU A 2 (j ••• (j A n ...)—/(Ai) +/(A 2 ) 

then the function is called totally additive. (If we 
e partitions in S by the natural numbers 1, 2, 3... 


can number 
then the sets 
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Aj, A 2 .--are called a countable partition of S or if in a set ther 
is a correspondence between the elements and the natural 

numbers, 1, 2, 3,.then the set is said to have a countabl 

number of elements). The total additivity condition may be 
written as 


GO GO 


f(U 

i=l i=l 


/( A t ) where A { f)A 3 =^, V i and j, i^j 


Here \j A f = Aj(j A 2 UA 3 U ••• and V means Tor all’ 
i =l 

Total additivity is a property concerning the nature of the 
mapping. For example the mapping intervals -^lengths of inter 
vals is of the type that when a line segment PQ is divided into a' 
countable number of disjoint segments the sum of the lengths of 
these segments equals the length of the line segment PQ. 

2.33. Measure. A set function which is non-negative and 
totally additive is called a measure. This means that if fi$) i s a 

function defined on S and if/(S) is a measure it should satisfv 
the following conditions. J 

m>o 


w 

/(U A,) =27/(A,) where A i r)A j = ( f >) y i and j, i=£j 
/(Ai)>0 for all A i} i = l, 2, 3,... 

T_ In !f ampI ! S ?' 3 ’^ and 2 * 3 - 4 the set functions are measures, 
because they satisfy all the conditions above. 

WHM.™?’ 34 * Prob * bilit y Measure. If in a measure the total 
easure equals umty or/(S) = l then the measure f(S) is called a 

me ^ re i T !? is definifci °n implies that if S is an out- 
come set then/ ( S) = L A probability measure is usually denoted 

come set S^then A 2 ’ a ‘ > " * disjoint Petition of the out- 

S=Aj(jA 2 (j .UA n = (j A { . 

i= 1 

Th 

Then P(S)=P( u A,)=Z P(A,)=1. 

i=l «=1 

conditions. S ' 10WS t * le P r °k a Mity of an event satisfie* the 

(1) 0<P(A)<1 

(2) P(S)=1 

(3) when AiOA^, v i and j. ijtj. 






PROBABILITY 59 

Therefore the probability of an event A may be defined by 
the three axioms (1), (2) and (3). These axioms give the proper¬ 
ties or the desirable qualities of the probability of an event A, 
but they do not give any clue to, how to assign a probability to 
&n event A or how to evaluate the probability of an event A. 

Let S be an outcome set. Let A, B, C be three mutually 
exclusive events such that AljBuC =S. According to the axioms 
or assumptions or postulates the probabilities of the events A, B, 
0, should satisfy the conditions, 

0<P(A)<1, 0<P(B)<1, 0<P(C)<1 
P(S)=P(AuBuC)=P(A)+PiB)+P(C) = l. 

The following are some of the possibilities : 

P( A) = 1/3 P(B) = l/3 P(C)=l/3 
P(A) = l/6 P(B) = 1/2 P(C) = l/3 
P(A) = 0-2 P(B) = 0-7 P(C) = 01 etc. 

Comments. There are a number of ways in which we can 
assign probabilities to A, B, and C, according to the axioms. In 
order to determine uniquely the probability of an event A we 
need some more considerations. To some extent it is possible to 
determine the probability of an event A by considering the experi¬ 
mental conditions, symmetry, past experience etc. 

Theorem 2.3.1. 

P(^) = 0 

Proof. If S is the outcome set S |j</> = S and further S and 
<f> are mutually exclusive. 

P(Su^)=P(S) 

P(S(J^)=P(S) -fP(</>)=P(S)=>P(^) = 0. 

Comments. The probability of an impossible event is 
zero, but the converse that, if the probability of an event is zero 
the event is impossible, need not be true. This point will be 
discussed later. 

Theorem 2.3.2. 

P(A) = 1-P(A) 

Proof. If S is the outcome set and if A is an event 

A(jA=S and further A and A are mutually exclusive. 

P(S) = P(A u A) =P(A)+P( A). 

But P(S) = 1=>1=P(A)+P(A) or P(A) = 1-P(A). 

For example if the probability of the occurrence of an event 
i ls 0'4 then the probability of non-occurrence of the event A is 
1 — 04 = 06. 
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Theorem. 2*3-3. 

P(AuB)= p ( A )+ P ( B )- P ( AnB) - 


Proof* 

AuB=A(j(AnB) 

and A and Af)B are mutually exclu¬ 
sive. 

P(AuB)=P(A)+P(ADB) 

= P(A) + [P(AnB)+P(AnB] 

-P(AnB) 

But AnB and A OB are mutual¬ 
ly exclusive. 

P(AnB)+P(AnB) 

=P[(AnB)u(AnB)] 
= P[(AuA)DB] 



=P[Sobj 

= P(B) 

P(AuB)=P(A)-hP(B)-P(AnB). 

Ex. 2.34.2. Suppose that the probability that a woman enter¬ 
ing a shop buys cheiving gum is 0 80. the probability that she buys 
sleeping pills is 0-70, the probability that she buys chewing gum and 
sleeping pills is O'55. What is the probability that a woman entering 
the shop buys chewing gum or sleeping pills or both ? 

Solution. Let A be the event that a woman entering a shop 
buys chewing gum, and let B be that of buying sleeping pills, then 
P i A)=0-80, P(B) = 0-70 and P(ADB) =0-55. 

Therefore, P(A(jB) = P(A)-f-P(B)—P( AnB) =0-80+0-70-0-55 

= 0-95. 


Comments. It may be noticed that if AplB = ^ then 

P(AnB) = 0 and P(A(jB)=P(A)-fP(B). 

Ex. 2.34.3. Suppose the probability that, a man selling curios 
at Kanya Kumari in the southern tip of India will have a customer 
on a Sunday afternoon who vjill buy a decorated sea shell is 0'80, that 
he will buy a sea shell chain is O’40, that he will buy a decorated 
shell and a sea shell chain is 0’30, what is the probability that the 
shopkeeper has a customer who will buy at least one of the items, 
a decorated shell and a sea shell chain ? 

Solution. Bet A and B denote the events that the customer 
wjII buy a decorated sea shell and a sea shell chain respectively, 

thenP(A) = 0' 8() , P(B) = °'4° and P(AnB) = 0-30. We are asked 
to find out P(A(jB). 

But P(A(jB) = P(A)-f P(B) —P(AnB) 

= 0-80-f 0-40 —0-30=0-90. 


Exercises 

2-20. Give three examples each for 
(1) an additive set function. 
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(2) a measure, 

(3j a probability measui’e. 

2-21. If A, B, C are a disjoint partition of the outcome set S then 
the following measures be probability measures ? n 

(a) P(A)=0-5, P(B) = 0-3, P(C)=0-2 

( b ) P(A) = 0-5, P(B) =0-8, P(C) = 0-7 

(c) P(A)=0-7, P(B)=0'3, P(C) = -0-l 

(d) P(A) = 1- 1, P(B) —0-5, P(C) = 0-3. 

2-22. If P denotes a probability measure show that PlAl iRr i^\ 

„P(A)+P(B)+P(0)-P(AnB)-P(BnO)-P(Cni)+P(AnBnC) anil gen?ml 

lize the result. 

2-23. Let A be the event that a philosopher taking an evening walk in 
the museum park in Trivandrum on a Sunday evening, will see a girl in a 
cotton sari and let B be the event that he will see a girl who has anna nada 
(walking like a swan in a lake). Let the probabilities for these events be 
given as P(A)=0-50/_ P(B)=0-40 and P(ADB) = 0-20. Interpret tho events 

A(jB, AljB, A(jB, A(jB, Af)B, Af)B and Ap)B and evaluate the corres¬ 
ponding probabilities. 

2-24. Consider an experiment of throwing a coin 3 times. How many 
possible events are there ? J 

2-4. HOW TO ASSIGN PROBABILITIES TO VARIOUS 

EVENTS 

In the previous section we have seen that the probability of 
an event is a non-negative number which is less than or equal to 
one. Also we noticed that with these properties for a probability 
there are a number of possibilities for the probability of an event 
A. Here we will discuss further considerations which will enable 
us to determine the probability of an event in some sense, 

2*41. Symmetry. Consideration of symmetry in the out¬ 
comes of an experiment is a useful tool for deciding the probability 
of an event. Consider an experiment of rolling a perfectly syrns 
metric die once. If the die is as nearly perfect and symmetric as 
possible with respect to the six sides marked 1, 2, 3, 4, 5, 6 then 
it is justifiable to assign equal probabilities to the occurrence 

°f I, occurrence of 2., occurrence of six. Here the outcome 

set 

S = {1, 2, 3, 4, 5, 6} 

= {1}U{2}U{3}U.U{6} 

1=P(S)=P{1}+P{2} +.+P{6}. 

Prom symmetry we assume that 

P{1}=P{2}=.= P{6} 

1 

. ~ 6 ' 

P r °bability of getting any one face (say 6) when a sym- 
nc dle is rolled once, is 1/6. 
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example of tossing a coin which is as sv m 

Consider anoth ^ with respect to the two s f de , 

metric and as po a case the co in may be called unbiased. If 
thf ossibiHties in a trial are head and tail then each can be gi Ven 

1 probability 1/2. 

Ex 9.41,1. If an unbiased com is thrown 3 times find the 
probability of 'getting atleast one head ? 

solution. The outcomes may be denoted by 


Vi=(H, H, H) 
V 3 =(H,T, H) 
V 5 =(T, T, T) 
V 7 = (T, h, T) 


v 2 =(H, h, T) 
v 4 =(H, t, T) 
v 6 =(T, t, H) 
V 8 = (T, H, H) 


then the event A of getting atleast one head is 

A={Vi, V 2 , V 3 , V 4 , Vg, V 7 , V 8 } and P(A)=P{V 1 }+P{V 2 } 
4 -PiV 3 } + P{V 4 }+P{V 6 }+P{V7}4-P{V8} > (since Vi,y 2 ,...\ 8 are mutu¬ 
ally exclusive elementary events). We may assume equal pro¬ 
babilities 1 /8 each for all these elementary events since there is 
complete symmetry in the experimental, outcomes. 


Therefore, P(A) = 7/8. 

Comments. From this example it may be noticed that 
P(A) = 1 —P(A) and A has only one element V 5 and therefore P(A) 
= 1- P{V 5 }= 1 —1/8=7/8. This may also be considered to be the 
total number of elementary events or outcomes or sample points 
favourable to the event A, divided by the total number of elemen¬ 
tary events, that is, 

^ 4 x Total number of elementary events favo urable to A 

Total number of elementary events 


where there is symmetry in the outcomes. Whenever there is 
symmetry in the experimental outcomes the above ratio can be 
used as a tool for assigning probabilities to events. 


Ex. 2.41.2. Find out the probability of getting an ace from 
a well shuffled deck of 52 card if a card is selected at random. 

Solution. We may assume symmetry because the cards 
are well shuffled. The total number of elements in the outcome 
set is 52 and the number of elements favourable to the event is 4 
and hence the required probability is 4/52. 

• 

Comments. Here at random means that no importance is 
given to anyone card when selecting a card or all the cards are 
given equal chances of being selected in the sense that symmetry 
is not lost while selecting a car . 
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Ex ' 2 w\'nt ; A .^ onsi 9 r i ment °f 100 radios contains 25 defec- 
tm ones. What is the probability of getting 3 defectives in a 
random sample of size 10 from the consignment ? 

Solution. Here the sample is a random sample i.e., all the 
samples of mze 10 are given equal chances of being selected We 
can assumesymmetry. The total number of possible samples of size 

10 is ( 10 )■ The number of ways in which a random sample 
of size 10 containing 3 defectives may be selected 

-(?)(:) 

Hence the required probability 

_(2 XI) 

"TST 

Comments. It is seen that consideration of symmetrjr is a 
useful tool for assigning probabilities. But when the outcome set 
has an infinite number of elements the method of taking the ratio 
(th« number of elements favourable to an event to the total 
number of elements) must be modified to suit the experiment 
under consideration. 

2.42. Method of Relative Frequencies. Consideration 
01 symmetry may not be suitable for all problems. For example 
if vre want to find out, the probability of getting a head in an ex¬ 
periment of throwing a biased coin, the probability that a new 
born baby in a particular set of people, is a boy, the probability 
that a man will die, the probability that a monkey will type this 
book word by word if it is given a typewriter to play with etc., 
symmetry is of verj' little use in finding out the probabilities. It 
is quite unlikely, but not impossible, that a monkey will type this 
book word by word if it is given a typewriter to play with. We are 
quite justified in assigning a probability zero to this event. This 
18 almost surely an impossible event. So far there is no known 
case of a man living forever. We are justified in assigning a proba- 
mty one to the event that a man will die. This is almost surely 
a sure event. These types of arguments have led to a definition 
°J probability as a measure of conviction of mind based on 
e ^pe.nence. In an experiment of throwing a coin if everything is 
^n.own a bout the coin such as all physical characteristics of the 
w , ln ’ ^* le forces acting on the coin etc., one may be able to say 
th ^ er outcome is head or tail. So some people may argue 
ah P r °bability is in some sense a measure of our ignorance 
dis ° Ut fhe various aspects of the experiment. For an elaborate 
ad CU8s j 011 °f personal probability, utility etc., the reader is 
Vlse d to see ‘Foundations of Statistics’ by L.J. Savage. 
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Wen if there is symmetry, in many cases, a man with a 
logical mind sometimes gets confused m deciding symmetry i n an 
xoeriment. For example, consider the following problem. A 
businessman wants to go for a business trip. Two of his young 
secretaries Miss Chick and Miss Cute want to accompany him. He 
wants only one secretary for the trip. He decided to conduct a 
game of chance and take a decision. He throws an unbiased coin 
twice If there is atleast one head he will take Miss Chick. Other- 
wise Miss Cute will be taken. What is the probability that Miss 
Chick will be taken ? This may be argued like this. There are 
four possibilities (H, H), (H, T), (T, H) and (T, T) ; out of these 3 
are favourable to Miss Chick’s selection. So the probability is 3/4. 
Some people may argue like this. If head comes in the first trial 
the experiment is over and Miss Chick is selected. Therefore the 
possible outcomes are (T, H), (T, T) and H, and hence the required 
probability is 2/3. 


These show that consideration of symmetry alone does not 
uniquely determine the probability of an event. Another method 
of determining the probability of an event is consideration of 
relative frequencies. Suppose that we want to find out the pro¬ 
bability of getting a head in throwing a coin, assuming that the 
coin will fall either head »r tail, but nothing is known about the 
biasedness of the coin. We can estimate the probability of getting 
a head by conducting an experiment. Throw the coin under the 
same conditions without giving any importance to any side 100 
times. Count the number of heads. Take the relative frequency 
—the ratio of the number of heads to the total number of trials. 
Repeat the experiment 1000 times and take the relative frequency. 
Continue the experiment. If this relative frequency tends to a 
limit in the long run, this limit may be taken as the probability of 
getting a head if that coin is thrown under the conditions of 
the experiment. If the experiment is conducted a sufficiently large 
number of times then a good estimate of the probability is given by 
the relative frequency. There are other questions in this respect. 
We do not know whether there exists a limit or not. If a limit does 
not exist our estimate of the probability has no meaning. Further 
the repeatability of an experiment under the same conditions, is 
assumed. 


The discussion of the definition and the methods of evaluation 
of probability is not complete in this section. For further reading 
see the references at the end of this chapter. 

2-5. SOME USEFUL CORRESPONDENCE BETWEEN 
SET THEORY AND PROBABILITY THEORY 

Set Theory Probability Theory 

1. Outcome set S. (sample Sure event, 

space, possibilities space 
etc.) 


PROBABILITY 

Set Theory 

2. Subset of an outcome set 

3. Element of the outcome set 

4. Disjoint subsets. 

5. Null set cf). 

6. Totally additive, non-nega¬ 
tive, set function with total 
measure unity. 

7. P(A) ~ 0 

8. P(A) —1 

9. A(jB (where A and B are 
subsets ofS). 

10. AnB. 

11. A (Complement of A). 
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Probability Theory 
Event. 

Elementary event. 

Mutually exclusive events. 
Impossible event. 

Probability measure. 

A is almost surely an impossible 
event. 

A is an almost sure event. 
Atleast one of the events A and 

Simultaneous occurrence of A 
and B. 

Non-occurrence of the event A. 


Exercises 


2 25. la a class there are 
one name is picked up at random, 
name ? 


30 boys and 20 girls. From the class list 
What is the probability that it is a boy’s 


partner 2 !',***“ h “ 7 Spade3 - What » probability that his 

[а) 2 spades ? 

(б) at least 2 spades ? 

2-27. What is the probability of throwing 7. S or 9 with 2 dice ? 

2'23. In a set of 100 businessmen 10 are unmarried end 
marrioj What is the probability of selecting a sample of 20 businesjmen out 
of which o are married, if the sample is sole bed at random ? MsmBn 01,1 

2 29. In a community of 409 people 200 are highly intellmnnt inn 
above average, 50 are average and the reht are idiots if a *' &T( 

size 40 is taken, what is the probability thot ft ' t f lanclc)1T1 sample ol 
intelligent, 10 above average and 10 average ? ™ ^' ap ° COntains 20 higlilj 

2-39 A psychiatrist reports that his survey r ,P inn i 
lunar ecclipse shows that 70 are ps/chonaths *•>!) \ 0 peopm on j 

psychopaths and have xenophobia r ' ' i Xer l ophob,a ’ 20 ar< 

xenophobia. Should you question the results* in 1,,'s’Cilgf, Pat ’ 1S n0r hw< 

2.C. CONDITIONAL PROBABILITY 

baluliJL tl ' e f tlieOry t di60U f etl 80 far we w cre concerned about nro- 
outcome Y's 6 tP r" tS < , subsets of atl outcome set S) relative to an 
outcome Tt l -ents, ar 

be de P n ( ntld ‘r® P robabilit y measure of A relative to S.. This maj 
A rektivt to ,h C °" TOn,enc , e ' <“ *(A I S) (probability of an event 
lity statement^ T “f obab , ilit .V of A given S) So a probabi- 
a «>nditimkw ' aS ' !le Probability of an event A is 0-!>5", is 

«ed outcome ^“l” iVet^ to a 8 P~*' 

ii the following discussion we will considei 
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probabilities of events relative to various events, for example pro . 
bability statements like P(A | B) =0*3 ^probability of the event A 
given the event B is 0’3). When B is the outcome set S we will 
denote the conditional probability P(A | S) as P(A), otherwise we 
will denote the conditional probability by P(A | B). 

Ex. 2 . 6 . 1 . Consider the following problem. In Dr. Nimbus ’ 
reducing laboratory people are given one of the two treatments, VL 5Q 
and VL 100. There are SO people taking treatment, out of which 
some are divorced and looking for the next husband and some are un¬ 
married, some weigh more than 150 lbs. and some weigh less than 
150 lbs. The following is the exact classification. 



Over 150 lbs. 

Less than 150 lbs. I 

Divorced 

15 

10 

1 

Unmarried 

20 

5 


VL 50 



Over 150 lbs. 

Less than 150 lbs. 

Divorced 

a --- 

12 

1 

u 

Unmarried 

15 

2 


VL 100 


A visitor Miss, Iceberg goes to Dr. Nimbus 5 laboratory. Assum¬ 
ing that every patient in the laboratory is given an equal chance of 
being called to give testimony of her achievement in weight reduction 
what is the probability that Miss Iceberg will hear testimony from a 
patient who is (1) undergoing treatment VL 50 (2) divnrrid 
heavier than 150 lbs. % ’ d, {$) 

Solution. Here there are 80 patients and all are assumed 

m n ° e ° f bein / calIed ' For the probabilities fa 

(1), (2) and (3) we are concerned with the outcome set fi — q t <9 

S££tr£f ~+" **••— «* k ;-titt 

PfA)= (l ; 5 + 10+20-f-5) 
k 1 80 ' 


_50 

“80 


PBOBABILITY 


67 


p/T5\_ (15 4-10-)-12 + 1) 
v ' 80 

_38 

~~80 

p^j_ (15 + 20 + 12 + 15) 
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~so’ 

What is the probability that Miss Iceberg will hear a person 
who is oyer 150 lbs given that the speaker is divorced ? This 
can be denoted by P(C J B) = probability of the event C given B. 
Here we are concerned only with the divorced patients. So the 

set under consideration is S 3 the set of divorced persons. Where S, 
is given by. 



Over 150 lbs . 

Less than 150 lbs. 

VL 50 

15 

10 

VL 100 

12 

1 


Total = 38 


Hence P(C | tt)-( 15 + 12 ) 

38 


_27 

~38 

n/n^S? mments * In tilis sample it may be notirprl +n +. 
P(CnB) = probability that the speaker is divorced and over +50 
lbs. in weight=27/80. It may be further noticed that 

P(CnB) 27 38 
= 80“80 


P(B) 


27 

=33=P(C | B). 


VL 50* wh pcrso . n \ w1 . 10 weI S hs “ore than 150 lbs. and is taking 
weigh. h ° ,T ei + lees ,han J 50 lbs. and is taking VL 50 who 
ghS “ ore than 150 »«• and is taking VL 100, who weighs Teas 
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, „ orl J iei taking VL 100, add up to unity. That i 8 Drn 

than lo° lbs. a p(A g i B1 add up to unity in the reduced*^ 

B ab With these intuitive nitions we will define conditional probabh 

litv gts follows. 

2-61 Definition. If A and B are events and if P(B)*& 
then the conditional probability of A given B may be defined as 

PiAnB) 

P(A | B)= J, (B) 

i e The probability of A in the reduced set B equals the probabi¬ 
lity of AnB in the outcome set S, divided by the probability of B 
is the outcome set S. From the above definition we have a general 
multiplication rule as follows. 

P(AnB)-P(A | P>) P(B) 

= P(B ! A) P(A)T 

This rule can be extended to a number of events. For 
example 

P(AnBnc)=P(A | BnC). P(BnC) 

= P(A I Bnu) . P(B I C) . P(C) etc. 

Ex. 2.61.1. A consignment of 20 radios contains 6 defectives . 
Radios are selected at random one by one and examined. The radios 
examined are not put back. Whcit is the probability that the 10th one 
examined is the last defective ? 

Solution. There are exactly 5 defectives in the first 9 
examinations. Let A denote the event of getting 5 di fectives in 
the first 9 examinations. 

( 14 i ( 6 j 
Pf A l-U A 5 I 
r( A '“ / 20 \ 

( 9 ) 

Let B be the event of getting the 10 th examined a defective 
P(B | A) = -f— 

(There are 11 radios left, out of which one is a defective). 
The probability that the 10 th one is the 6 th defective 

= P( An B) =P(A) . P(B | Ai 


14 

4 


6 


11 


20 

\ 9 ) 


21 
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Ex. 2,61.2. A box contains 3 red balls and 7 green balls, 
balls are picked out one by one at random without, replacement. U ' ia 
is the probability that the st tend bail is green given that the first one 

is green '! 
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Solution. If the first one is green and if it is not put back 
then there are 6 green balls out of 9 balls and hence the required 
probability is 6/9. 

2.62. Independence of Events. Two events A and B 
are said to be independent if P(AnB) = P(A).P(B). Independent 
events may be explained as events where the occurrence of one 
does not affect the occurrence or non-occurrence of the other. 
This explanation is often misleading so the reader is advised 
to stick to the definition as P(AnB)=P(A).P(B). 

Ex. 2 62-1. In example 2 61-2 what is the probability of 
drawing two green balls if the first ball is replaced. 

, n • A ion c* Here ^ probability of drawing the first green 
_^ econd tria l is independent of the first trial and the 
probability of getting a green ball is 7/10. Hence the required pro- 

lability is — . -j-=49/100. 

if W0°the”' P(l 2 |B)ip ( i ) and B are mdependent eTOnts and 

Proof. If A and B are independent P(AnB) = P(A) P(B) 

But P(A I P.^Z(An B ) _ P(A)-P(B) 

1 1 ' P(B) P(B) ' 

m ?'* ZZ * f A 0 and - B are independent prove that 

(1) A and B are independent, (2) A and B are independent n\ 

A and B are independent. independent, (d) 

i.e, (1) Prove that P(AOB)=P(A).P(B) given that 
P(A0B)=P(A).P(B) 

Proof. P(AnB)=P(AC/B) = l-P(AuB) 

= 1 -[P(A)+P(B)—P(AfiB)] 

= 1— P(A) — P(B)-f P(A).P ( B) 

= [1 — P(A)][1— P(B)] = P/a.) P/Bl 1 

(2) and (3) are left to the reader. 1 J 

2.62. Pairwise independence. The events A A a 

are said to be pairwise independent if P(A,nAd =PfA IP/? Tr" 
all i and j, iz£j, v 33 r ( A <)T(A,) for 

2.63. Mutual independence. The events A A a 
mutually independent if the probabilities of the intersection' of any 



Fig. 2-13. 
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M. ‘probaSes'pairwTse t/gS^ 

IrenW*Sp(A^nS- S ‘ ’ (B ° C > 

Here the numbers in the various subsets denote the numbers 
of elementary events in the various events and symmetry in th e 
outcomes is assumed. 

Lemma. 2.62.3. If Bi, B 2 ,...B M denote a disjoint partition 
of the outcome set and if P(B{) 5 ^ 0 for i=l, 2 then for any 

n 

event A, P(A) = E P(B Z ),P(A/Bi) 

• 



Proof. In this case A may be written as 

A=(AnB 1 )u(ADB 2 )u —U(AnB„). 

(This may be verified from Fig. 2-14.) 
where AnBi, AnB 2 ,...,AnB„ are all disjoint 

P(A)=P(A n Br) +P(A D B 2 ) +...+P(A n B n ) 

=P(Br) P(A | Bx)+P(B 2 )P(A | B 2 ) + ... + P(B„) 

xP(A | B„) 

n 

= 2 P(B,-) P(A | B,-). 

i=l 

Ex. 2.62.2. Four people Mr. Fox, Miss Tod, Mr. Rock and 
Mr. Jack compete for the presidency of Piggyland. A public 
opinion poll gives the estimates of the probabilities of their winning 
as 0 40, 0-20, 0 30 and 0'10 respectively. The probability that gamb¬ 
ling will be nationalized by them, if they are elected are 0-85, 
0 90, 0 30 and 0-95 respectively. 

What is the probability that gambling will be nationalized after 
the presidential election ? 

Let A be the event that gambling will be nation¬ 
alized after the presidential election. 

tit ® 1 ' ® a ’ anc * be the events that Mr. Fox, Miss Tod, 

Mr. Rock and Mr. Jack will be elected, respectively. Then we 
are given that J 

P(B 1 )=0-40, P(B 2 )=0-20, P(B 3 )=0-30 and P(B 4 )=0*20 
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P(A I B 1 )=0-85, P(A I B 2 ) = 0Q0, P(A | B a )-030 
and P(A | B 4 ) = 0*95 

P<A)=_.£P(B 4 )P(A|B ( ) 

V,— 1 

= 0-40 x 0-85 + 0-20 x0-90+0-30 x 0-30+0-20 X 0-95 

-0 80. 

Theorem 2.62.4. Bayes 5 Rule. If B lf B 2 ,...B„ denotea 
disjoint partition of the outcome set, P(BA^0 for i= 1, 2 ...n and 
P(A)t^ 0 then 

P(B r | A)= - F(Br ) ' P ( A 1 

} p (B ( ) . P(A | B,) 

4 = 1 

for r = 1, 2 ,...n. 

Proof. By definition 

Pi'B I a\— P(BrHA) _ P(B r ).P(A I B f ) 

' r P(A) P(A) " 

n 

By lemma 2-62 3. P(A)= E P(B<) P(A | B+ 

i= 1 

Hence the result. 

Here in this theorem the P(B*) for ♦ = ], 2 ,...n may be called 
prior or a priori probabilities in the sense that they are probabili¬ 
ties determined before the observation of the occurrence of any 
event and P(Bj | A) for i — 1, 2 ,,..n may be called posterior or a 
posteriori probabilities in the sense that these are probabilities of 
B f for i = l, 2 ,...n after observing A. This theorem is aften called 
Bayes’ Rule and is important because it gives some sort of inverse 
reasoning. This aspect may be seen from the following example. 
Bayes assumed the principle of equal division of ignorance in an 
unknown situation, i.e., he assumed that all the prior probabi¬ 
lities P(B,) for ?'=1, 2are equal if nothing is known about 
them. There is a lot of controversy regarding this point. For 
further information see the references given at the end of this 
chapter. 

Ex. 2.62.3. Three machines X, Y, and Z of equal capacities 
are producing bullets. The probabilities that the machines produce 
defectives (bullets which do not satisfy the specifications) are 01, 0-2 
and 01 respectively. A bullet is taken at random from a day's pro . 
duction and is found to be defective. What is the probability that 
it came from machine X ? 

Solution. Let Bi, B 2 , B 3 be the events of getting a bullet 
which came from the machines X, Y, Z respectively. Let A be 
the event of getting a defective bullet. Since the machines are 
°f equal capacity we can assume that, P(Bi)=P(B 2 )=P(B 3 ) = 1/3. 

P(A | B 1 ) = probability of getting a defective given that it 
came from X = 0’L 
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P(A I B 2 ) = 0-2 and P(A | B 3 )=0*1. 

We want the probability that the defective came f rom 
Y-P(B, I A)=Probability of getting the bullets which came f rom 
X given that it is defective and, therefore, 

K . P(Bi) P( A | Bi) 

P(Bj | A )-^5^P(Bj) P(A | B<) 

(1/3)(<KL)_ 

= , (l/3)(0’l) + (l/3)(0-2) + (l/3)(0-l) 

= 0*l/0'4=l/4 

l?C AC 


2.31. Find the probability of drawing two spades from a well shuffled 
deck of 52 cards if (1) the first card is replaced before the second one is taken, 
(2) the first one is not replaced. 

2.32. Give one example each of two events which are (a) mutually 
exclusive and independent, ( b ) mutually exclusive but not independent (c) 
not mutually exclusive but independent, ( d ) not mutually exclusive and not 
independent. 

2.33. Give 2 examples of three events which are pairwise independent 
but not mutually independent. 

2.34. A radio station broadcasts the correct time every hour 
on the hour. What is the probability that a listner who switches on the 
radio at random has to wait less than 20 minutes to hear the correct time ? 


2.35. A line cuts the line segment A B into two parts. What is the 
probability that the ratio of the two segments (smaller to the larger) is less 
than 1/4 ? 

2.36. If A and B are mutually exclusive events and if P(A) = 0-5 and 
P(B)=0-3 find the probability of (a) A(jB, ( b ) A | B, (c) ApjB. 

2.37. There are three machines producing 10,000, 20,000, 30,000 
bullets per hour respectively. These machines are known to be producing 
!%• 2%, 1% defectives respectively. One bullet is taken at random from an 
hour’s production of the Three machines and found to be defective. What 
is the probability that this bullet came from the third machine ? 

2.33. From a box containing 5 red and 10 white rose flowers two 
flowers are taken at random one by one. If the first one is a red rose, what 
is the probability that (a) the second one is red, ( b) the second one is white ? 

2.39. A survey is conducted on two random samples of 100 men and 
0 women ; it is seen that 2 men are deaf and one woman i-* deaf. Taking 
these proportions as estimates of the probabilities of getting a deaf man and 
a deaf woman respectively, what is the probability that a deaf person taken 
at random is a male ? c 


2.40. Among three identical urns one has 2 red marbles, one has one 
red and one green marble and the third has 2 green marbles. One urn is 

.!lf e wv, a 4 rai5 A° m ai L d S en a , marble ia P icked at random. It is found to be 
red. What is the probability that the other marble in the urn is also red ? 

® u PP° s e that one of three men, a politician, a businessman an 
Probabilities nf ^PP ointed . a ® the chancellor of a university. The respective 
?esearch^^ °‘ 30 ' °‘ 2 °- The probabilities that 
0 30 0*70 0 -S 0 rPQnA + • ^ pr°raoted. by these people if they are appointed are 

pr ° babiu * «»* 

whether machine^X fill & cbance ^ a particular engineering project’s failure 
The proUbiHtv the w°\, ?°, P robab j^y that machine X fails is 0-1. 
It is seen that Hie proiec/hn^fi f ^ fails is 0 9 and is otherwise 0-2. 

ure is due to the failure of X / ^ What 18 the probability that the fail- 
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2-7. ENTROPY OF A FINITE SCHEME 

This concept is widely used in Information Theory and in 
h0 Theory of Communications. We will give a brief introduction 
. section. For additional reading in this line references are 
given at the end of this chapter. 

2-71. A Complete System of Events. A system of events, 

a A 2 ,.- > A« in which one and onl y one of them occurs in each 
triab may be called a complete system of events. 

For example, 

(a) Appearance of head or tail in the throwing of a coin. 

(b) Appearance of 1 or 2 or 3 or 4 or 5 or 6 in the rolling 
of a die once. 

(e) A set of mutually exclusive events A lt A 2j _A n such 

that A 1 UA 2 U.(jA„=S (the outcome set). 

2*72. Finite Scheme. A complete system of events 
A t , A 2 ) ...A n where n is finite, together with their probabilities 

Pi> ^ 2 ’.m ca lled a finite scheme. By this definition a finite 

scheme may be represented by the matrix 

a _tai, .a„i 

LPn P 2 ,.PnJ 

when AjUAaU.UA n =S and A,nA* = ^ for all iandj, i^j 

0<P,<1 and Z P <= = 1 . 
i= 1 


2 *^ 3 * Entropy of a Finite Scheme. Let us consider three 
mute schemes 


{a) 



(b) ( Al ’ As \ (C ) 
' 1 V0'999, 0-001J 5 [ > 


Ai, Ag 

04, 0-6 


) 


fnro n, n a - Al al ! d Aa lla¥e e< I ual chances of occurrence and there- 
Ih th »- ,S great unceIta 'nty about the occurrence of A, or A,. In 

scheme ttf “• , V<5ry . 3 reat , cbance of . A, ’ s occurrence. So in this 
taintv 1S l e3sla ' dl °f certainty. In (c) the lack of cer- 

able toTaC Sald t0 be ?, be , tween those in (“) and (6). It is desir- 
a rneasu re of lack of certainty in a finite scheme. One 

of the measures suggested is H( ft , p 2 ,...p„) = -lc 2 Pi i ogp . where 

fo i g t=> 1 

constant and H is only a notation for a function of 
then *i; n * un , 0g Pi means the natural logarithm of Vi (if ^=6 
/, lln 1 the lo g arith ^ of b to the base a and is written as 
is written Hllen a = e the base is not usually written, i.e., e x =b 
where e i<? aS ^ =log b - These are called natural logarithms 
» a constant approximately equal to 2-71818). H^,..., ? ) 
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is usually 


called the entropy of the finite scheme ( 


Ai,. 

Pi,..., 


Pn 


In practieaKitu ^ e H in this simplified form 

purpose and »(«' called the -information' in the fiX 
taken her 5,,. d(lfimtiorl of 'information' is used in 'I n ; orm . 
tion Theory' and in 'Mathematical Theory of Communications' 
etc Some statistical concepts of 'information will be d,scu 8sed 
hi later chapters. Some other areas where the probability theory 
1 widely applied are (1) Markov Processes (2) Ergodic Theory, 
(3) Random Walk, (4) Queuing Theory, (5) Genetics, (6) Space 
Research—especially in predicting the operational ability of space 
vehicles, region of impact of bombs, rockets etc. (7) Agricultural 
production, in testing of crop yields, and in selection of one 
variety from the other, (8) Industrial production process, especi- 
ally in controlling the quality of goods, (9) Sociological, Psycholo¬ 
gical and Public opinion surveys etc. 

Exercises 

Find the information . . \Pn) = ~2 Pi log in the finite 


2 43 . 
scheme. 


/A lt A,, A 3 , A 4 \ 
l -2, -4, -3, -l) 


2 - 44 . In a discrete noiseless system (a message is not disturbed while 
it travels) a coded message source produces a sequence of letters chosen from 
among the letters o, b, c, d with probabilities 1/10, 1/5, 2/5, 3/10 respectively, 
where successive symbols are chosen independently. What is the entropy per 
symbol ? Show that it is a maximum whon the probabilities are equal. (This 
is intuitively the most uncertain situation). 

2 - 45 . Show that the entropy H(p x ,.. p n ) in a finite scheme satisfies 

the following conditions. 

(1) H is continuous in p i ; (2) If all the p^s are equal, that is, if Pi~ 
1/nfor all i, then H is a monotonic increasing function of n. (This implies 
that with equally likely events there is more choice or more uncertainty) ; 
(3) If a choice consists of two successive choices the original H is a 
weighted sum of the individual values of H. For example if the 
choices are shown in the tree diagram below, then H(l/2, 1/8, 3/8)=H(l/2, 
1/2) -i-(l/2)H(l/4, 3/4) where the weight 1/2 is taken because the second 
choice occurs with probability 1/2. 


One choice 

1/2 

/ 

/ 

/ 

/ 


Two choices 

1/2 


/ 


1 / 2 / 

/ 


/ 


/ 


\ 


1/8 


\ 


/ 

\ 


1/8 


\ 


\l/2 

\ 


\ 


\ 


3/8 


\ / 
\/ 

\ 


1/4/ 

/ 


/ 


\3/4 

\ 
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It can be shown that any function satisfying the conditions (1), (2) 
and (3) is of the form —k 2Vi l°gPi where A; is a positive constant. 
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CHAPTER 3 


STOCHASTIC VARIABLES 

Introduction. It is assumed that the reader is 
familiar with mathematical variables, differential and 
operators etc. In this chapter we will introduce another tvp 
variable called a stochastic variable, an operator called math 
matical expectation etc. In this chapter only the one variable °" 
one variate case is considered. 0r 

Stochastic Variables. An outcome set was seen to be finite 
or infinite, discrete in the sense individually distinct or continuous 
real or hypothetical. It was also noticed in chapter 2 that pro¬ 
bability is a measure defined on the outcome set S such that the 
total measure P(S) = 1. Now we will consider other types of func¬ 
tions defined on the outcome set. One such function is called a 
stochastic variable. A Stochastic variable X is a real valued func¬ 
tion defined on the outcome set S such that its domain is the 
outcome set and its range is the real line R = (— oo, oo). (Here R 
means the set of all real numbers from — co to -j-oo). Evidently 
these can be represented on a real line). In this book we will 
deal only with real stochastic variables, i.e., functions whose 
range is the real line or the set of real numbers. If a stochastic 
variable is defined on a discrete outcome set it is called a dis¬ 
crete stochastic variable and if it defined on a continuous outcome 
set then it is called a continuous stochastic variable. Stochastic 
variables are also called chance variables, random variables, vari¬ 
ates etc. For a rigorous definition, see reference [3] at the end of 
this chapter. 

3.11.1. DISCRETE STOCHASTIC VARIABLES 

Consider the following experiment of throwing a coin twice. If 
the occurrence of a head is denoted by 1 and that of a tail by 0, the 
outcome set , which is discrete, consists of the outcomes (0, 0), (0, I)> 
(I, 0), (1, 1). Let X denote the total number of heads in an outcome 
of the experiment. Evidently X is a discrete stochastic variable de¬ 
fined on 3={(0, 0), ( 0, 1), ( 1, 0), (.1 , 1)}. There can only be 0 or1 
or 2 heads. So the range of X is the set A = {0, 1, 2} and evidently 
this set A is contained in the real line E = [ — oo, oo}. For the same 
experiment let Y denote the quantity “number of heads minus num¬ 
ber of tails’’ in the outcomes of this experiment. Evidently Y is a dis 
Crete stochastic variable with the range E—{ — 2, 0, 2}. 


already 
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STOCHASTIC variable usually denoted by 

comments. Stochastic etc . For example in Ex 

V Y z etc., and their ranges y > be denoted by %, then 

3*11 .i" ary element in th«grange^ofX J a ber 

, takes the values 0, 1 and Z ', fi / orl a giV en outcome set. 
of stochastic variables may be define 8 # ^ ^ M x 

3.11.2. Consider the ex P e ? m ° h / um 0 r the two face numbers). 
EviitnilljVTa sUichastic variable which assumes the values 2. S, 4. 

Comments. In Ex. YpJobTbllftycoms^nding 6 to every 
th f ''' e achs 1 crete a stochast.e variable takes. (Hereafter we shall 

value a discrete for Etoc ) ]ast ic variable). In Ex 

“ui \ h °X takes the values 0, l, 2 with probabilities 1/4. 2/4, 1/4 
3 ' L ;v„lv Y takes the values -2, 0, 2 with probabilities, 

a ’resnectively. In Ex. 3.11.2, X takes the values 2, 3, 
J ’ U 12 With probabilities 1/M. 2/30,..., 1/86 respectively. This 
probability is called the probability function of a stochastic van- 
able X and is usually denoted by f(x). Ill Ex. 3.11.1 probability 
functions for X and Y may bo detined as follows : 

For the s.v.X, 


x 


f(x) 


0 

/i0) —1/4 

1 

/(l)-2/4 

O 

for s.v. Y, 

/(2) = l/4 

V 

_ /(y) 

-2 

/(-2) = 1/4 

0 

/(0) =2/4 

2 

/(2) = l/4 

for -s.y.X in Ex. 3.11.2, 

X 

fix 


r 1/4 for x — 0 
or f(x) = ) 2/4 for x = 1 
i.1/4 for x -2 


or hv) 


2 /(2) = 1/36 

3 /(3) = 2 36 


or 


12 


/ x) 


1/4 for y ~ 
2/4 for y = 0 
1/4 for y _2 


f 1/36 for *=2 

1 2/.'! 6 for x 

1 


l 1/35 for x 12 

according to the definiti 


/■12) 1/36 

As the range of any s.v X 

ft 

i /4 for x (\ 
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That is, X 

-2, 0 and 2 with non-zero 
bij“L and other values with zero probabilities. 

Fv 311.3. Consider a spinner as shown m Fig. 3.1 w h ere 
the dial is marled from 0 to 100. Suppose that the spinner is com. 
%e y balanced, in the sense that, the indicator wheat rotated is as 
l kiln to stop at any point as at any other point on the dial. 


If x denotes the distance in the clockwise direction from the 
noint 0 to the point where the indicator stopped then X is a conti. 
* nuous s.v. which can take any value in 



Fig. 3-1. 


the interval 0 to 100. Intuitively, the 
probability that, the indicator when 
rotated, will stop in between any two 
points, say, 25 and 32 is (32-25)/100= 
7/100. In other words the probability 
that 25 ^ x ^ 32 is 7/100. To this 
8 v. X we can attach a probability fun¬ 
ction/^) as follows : 

(1/100 for 0 < a < 100 
J ' x ' (0 elsewhere. 


3.12. Graphical Representation. A stochastic variable X 
and its probability function f(x) may be represented graphically 
by taking x along one axis, say the Xi-axis and f[x) along the 
other axis, say the X 2 -axis. There are different types of diagrams 
used frequently for such representations. They are probability 
curves, bar diagrams, histograms, pie diagrams, pictograms etc. 
The probability function in Ex. 3.11.3, when represented graphi¬ 
cally, gives a curve as shown in Fig. 3 .2a. In general when the 
stochastic variable is continuous we can expect the probability 
function which when represented graphically, to give a continuous 
curve. One such curve is shown in Fig. 3.26. 



Fig. 3*2(o) Fig. 3-2(6) 

following sections give some of the usual representations 
of discrete probability functions. 


stochastic variables ^ 

Fig - 3 ’ 3 is a d ^ammao lc represen. 
tationof the probability function in Ex. 3.11.2. here bars whose 
areas are proportional to the probabilities f(x) are erected over the 
corresponding *. For example in Ex. 3.11.2 the s.v. has the ranae 
A =={2, 3, 4,..., 12} with nonzero probabilities, 1/36, 2/36,..., 1/36 
respectively. Ihus the area of the bar over the point 2 is propor- 



Fig. 3.3. 

tional to 1/36, the area of the bar over the point 3 is propor¬ 
tional to 1/36 etc. This bar diagram gives some idea about the 
probability function to a layman. This technique of bar diagramma¬ 
tic representation is used for such data as the height measurements 
of citizens in a city classified according to age groups etc., since 
relative frequencies estimate probabilities. Here bars proportional 
to the frequency or number of measurements in a particular age 
group, may be erected over the middle point of the age interval. 

3.14. Histograms. In this diagram the probability func¬ 
tion f(x) is represented by rectangles whose areas are proportional 
to the various probabilities. These rectangles are erected over the 
intervals such that the points 2, 3,..., 12 are the middle points of 
the intervals. Here the probability function f(x) in Ex. 3.11.2 is 
represented by a histogram. This same technique may be used in 
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HEMatxc 

representing a given classified data in .histogram h 
rectangles over the various classes such that the areas fnf ectitl g 
angles are proportional to the frequencies in the various 1 rec b 
a classified data. A representation of discrete probal v • 8es of 
histograms may give a false impression that a; can take l 8 
values in an interval on the real line with nonzero prol v 
since our histograms cover an interval. Here we do not 
continuity for x but this representation may help a statist ^ 81 * 1 * 16 
get some ideas about the nature of the approximating probafr ta 
function if a discrete probability function is approximated h ^ 
continuous probability function for convenience of mathemat^ 
operations. llca l 

3.15, Pie Diagrams. If instead of bars or rectangles th 
probabilities are represented by sectors in a circle then such a dia 6 

\0 grammatic representation is called a 

9 + ^ pie diagram. 

3.16. Pictograms. Usually 
data obtained from surveys, experi¬ 
ments, economic data, budgetary 
data etc., are represented by dia¬ 
grams so that a non-statistician can 
have some idea about the allocations 
of the data into various subdivi¬ 
sions. For example the strength of 
armies of various countries may be 
represented by pictures of men, say, 
one man for every 10,000 men in 
the army. Such pictorial represen¬ 
tations are called pictograms. In 
Fig. 3.6 the production of apples in three countries is represented 
in a pictogram to give one some idea about the relative outputs. 



Fig. 3.5. 


Country A 


Country B 


Country C 





fig. 3.0. 

Exercises 

3.1. Define two stochastic variables in each of the following experi¬ 
ments and write down their probability functions. 

(a) A balanced coin is tossed 3 times. 

(b) A balanced die is rolled twice. 

(c) From a set of 20 girls and 30 boys 2 students are selected 
random one by ono without replacement. 

3.2. Find the probability function of a stochastic variable a 6'^ 
number of aces in a. hand of bridge (one bridge hand contains .13 cards sc 

at rarnh in from a well shuffled set of 52 cards). 
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3.3. Supposo that a mnoiiina , 
denotes the number of defectives in a 2 % defecfcive items. If x 

a day's production find the probability Junction foTx. randomly selected from 

hospital is a boy is 1 / 2 , fin^the^robabmty func^ b °f rn b&by l n a P» rfcicula r 
X=the number of boys in sets nf inn w for a stochastic variable 

there are at least 45 boys among the nsw^^k rf- 180 ^ nd the P robabib ty that 

* g the newb °rn babies, in a set of 100 births 

“■ “Tar CTiMmanvE 

value x =P< X< T • S ]f ss than or e< 3 uai to a specified 

variable X ftdlfbXw „ Th gl J e ? the Probability that a stochastic 
variable A tails below a specified point on the real line R = f-oo 

oo). For example F(5)=probability that X assumes a value W 

than or equal to 5 (that is, a; < 5). Whenever there is no confusion 
we will use the notations i—P /y ^ \ ; t*, B lon 

where x„ is a specified vXk ( ° )-P(X < *> or !?<*.)=?{*<*,} 

J(x) glen Iffuo ^ ^ with the faction 


X 

0 

1 

2 

3 

4 

and f(x)=0 elsewhere. 

Solution. 

From the table we can form 
follows : 


/(z) 


118 

2/8 

3/8 

1/8 

1/8 


the cumulative probabilities as 


F(0) = P{* < 0}=* 

F(l)=P{ a; < 1}=1 + | = 3 
F(2)=P{a; < 2} =*+*+»«• 

F(3) =P{# < 3}=*+ 

F(4)=P{a; < 4}=.§.+ *+4.+»+1 = «=!. 

be J^cmmulative distribution or the distribution function 

' 0 for x < 0 or for —00 < x < 0 
•§- for a = 0 or for 0 < x < 1 
F^=J 4 for x = l or fori < x < 2 
' j -I for x=2 or for 2 < x < 3 
I £ for x=3 or for 3 < x < 4 
l 1 for x=4 or for 4 < x < 00 


may 


Comments. In this example the various wav* m -a.- 

the l! 8 ‘ ri .^ uti011 function F(*) are given. It may be noticed 1 that 
he distribution function varies from 0 to 1. 
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If the distribution function F(#) of Ex. 3.2.1 is represented 
graphically, we get a step function as shown in Fig. 3.7. It ma 
be further noticed from the discussions so far that distribution 
functions of discrete stochastic variables are always step functions. 



Evidently if the distribution function of a continuous 
Stochastic variable is represented by a graph, we can expect a 
smooth curve as shown in Fig. 3.8. The curve may be expected to 



be smooth with the maximum ordinate equal to unity and with the 
minimum ordinate equal to zero. But the shape of the curve 
depends on the probability function of the s.v. The following 
properties are easily seen for a distribution function whether it is 
for a discrete or for a continuous s.v. 

(1) F(-oo) = 0 

(2) F(oo) = 1 (31) 

(3) F(a) < F(6) for a <6 

In Ex. 3-2-1 for the probability that x lies between 1 and 4 
(i.e. say, 1 < a;<4) is given by £+&-}-£=£. This maybe written 
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as F(4)—F(l) = l — §=§. In general the probability that x lies 
between two points a 0 and aj 0 + a» 0 may be obtained by 

F(r 0 + Ax 0 ) — F(aj 0 ). 

(where x 0 and x 0 -f- A a*o are two points. x 0 -\-ax 0 only means a 
point different from x 0 by a positive quantity Acr 0 . For example 
1 may be taken as x 0 and 4 may be taken as + A«o in Ex. 3-2*1). 
This may also be written as 

2 f(x) =P{a; 0 <X<a: 0 +A« 0 } (3*2) 

*„<»<»„+A-r 0 

when X is a discrete stochastic variable. 

This is equal to the area of the rectangles erected over the 
points from x 0 to a-Q + A^o in a histogram if the total area is 
assumed to be unity. 

3.21. Density function. If F(a;) is a continuous function 

satisfying the conditions (3-1) and if —F(a;) exists almost 

everywhere (except for a set of probability measure zero) and if 

/(#) = -fa F(x) thenf(x) may be defined as the density function 

for a continuous s.v. X for which the distribution is F(a;). It 
may be noticed that, 

x 

| f(x) dx=F(x) —F( —oo)=F(x) —0 = F(a:). (3-3) 

- CO 

Thus the probability function for a continuous s.v. X is 
often called the density function of X. The probability that 
tfoCa^aro-j- Aa; 0 may thus be obtained as, 

A#o 

| f(x) dz=F(x 0 + Az 0 )—F(a; 0 ) ' " - (3-4) 

It may be noticed that when X is a continuous s v. the 
probability that it takes a particular value, say x\, is given by, 

| f(x) dx = 0=F{z=x 1 } when X is continuous. ( 3 - 5 ) 

This shows that the probability measure P{a;=a: 1 } = 0 when X 
is continuous. In other words f(x) can be a probability function 
for a continuous s.v., X even if f(x) has a discontinuity point pro¬ 
vided the probability measure at that point is zero. It is intui¬ 
tively evident that f(x) can have a countable number of dis¬ 
continuity points provided the total measure in those points is 
zero, that is, if f(x) is continuous almost everywhere (a.e.) and if 
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itipn fix) can be the probability function of a 
/(*)>0 for all * then * » 


1 /(*) die is the 


X provided t /(*) Further ] dx «the 

continuous s.v., X, proviu j j 

„ rBa UQ der the carve/<*) between the ordinates at —6 and *=„. 


area uuuw ***- - > 

For example if/(*) * 3 g iretl a3 shoWt> F ‘ g ' 3 9 th6n 


j /(*)At 


the area of the shaded portion 



Fig. 3-9. 

In general the probability that x is greater than or equal to 
x 0 is given 

p{ a . 0 ^a;^«x)}= 2 f{%) when a; is a discrete s.v. [& o; 


P{r 0 <a:<oo}=J f{x) dx= P{a? 0 <«<°°} 

*o 

when X is a continuous s.v. 

The probability that | x | is greater than a? 0 , where | * | 
denotes the absolute value or magnitude of x, (without consider¬ 
ing the sign), is given by 

2 /(r) + 2 f(x) when X is a discrete s.v. 

— oo^03<— x % aT 0 <aj^ao 

—CO 

and j f(x) dx-\- j f(x) dx when X is a continuous s.v. 


— cc 


(3 7) 
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F °*_5 d 1S r\ e t' V ' 3 X ’ whose Probability function fix) is 
^Taded »rea S ‘° 8ram in Fi * 310 ' P M * I >1} is givin by 



Fig. 3-10. 

For a continuous s.v., X whose probability function or 
density function is represented by the curve in Fig. 3-11, the 
P{ | x | ^3} is given by the shaded area. 



Fig. 3-11. 

From the results discussed so far it is seen that if a function 
f{ x ) is to be a probability function it should satisfy the following 
conditions : 6 

1. /($)!> 0 for all x 

2. 2 /(a;) = lifXis a discrete s.v. (3-8) 


3. 


co 

j f(x) dx = 1 if X is a continuous s.v. 

— CD 


These conditions may be defined as the axioms or postulates 
°r a probability function ( i.e either the probability function for 
a discrete s.v. or the density function for a continuous s.v.) associ¬ 
ated with a stochastic variable X. 
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Ex. 3.2.2. /(*) = -f x for ,*=1,2, 3. and f(x)=0 d 3evihtf( 

Can this be a probability function for a s*v. X ? 

Solution. /(1)=£, /(2) = J, /(3) = £. 

.*. /(a?)>0 for all x. 

X f(x)= X /(*)=/(l)+/(2)+/(3) 


— oo^ay^oo 1^07^3 

= £-fi + e = TT^1 

Hence f(x) is not a probability function. 
Comments. It may be noticed that 


/(*)=-2^ f o r x=l > 2, 3 


and is zero elsewhere, satisfies the conditions for a probability 
function, since X f(x)= 2 =1 and /(*)>0 for all 

-co<z<cc l<a;<co 2* ^ 

x. The s.v. , X may be easily seen to be discrete since x takes 
only individually distinct values. 

Ex. 3.2.3. A function f(x) is given as follows 


/(*) 


x for 0<x^.l 
—j—for l<x^3 
= 0 elsewhere. 


Can f(x) be a probability function ? If so find the distribution func 
tion. 

Solution. Evidently f(x) >0 for all x. 


| f(x) dx= | 0.dx-\- J x.dx- j- J — dx-\- 




.-. f(x) is a density function. 

By definition the distribution function 


F (*) = J /(*0 


For any x such that —oo<a:^0 
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x 

•^0*0 “J O.efo—o 

— CO 


For any x where 0<s^i 
® x 

F (*)=J 0.dx+ j x . dx= * 

— 00 0 

For any x where l<a^3 




For any x where 3<z<oc 

0 1 3 * 

F(a;)=| Q.dx-\- j x dx+ j ^Z?Ldx+ f 0 dx 

— CO 0 1 3 

= 1 . 


The distribution function F(a) may be given as 


F(*)H 


0 for —co <£<0 

Y for 0<a<l 

1 X f \ 

—g- +T\ 3iB_ “2" ) f ° r 1<aj ^ 3 
1 for 3<a<oo . 


Comments. In this example the form of the density function 
is different in different intervals. Therefore in order to find out the 
distribution function we had to consider x in the various intervals, 
separately. If f(x) had only one form throughout the interval 

X 

<— 00, 00 ) then j f(x) dx can give F(a) directly in the interval 

— OO 

oo). 


£8 
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Ex. 3.2.4. Given the density function 


find k. 



k(2—x ) for 0<x<2 
0 elsewhere 


Solution. If f(x) is a density function 




2k=l 

or &=l/2. 

Comments. If instead of a density function a probability 
function f(x) is given for a discrete s.v.X together with the range of 

X, then we can use the result that 2 f(x) = l and any unknown 

- 00 <#< 00 

quantity can be evaluated. These discussions indicate that to 
any function f(x) which is non-negative where 2 f(x) or 

— oo <a?< oo 


oa 

| f{x)dx exists, we can associate a probability function as follows. 

— CO 


oo 


Let E f(x) or l f(x) be equal to k. 

— CO <£< 00 J 


CO 


Consider the func¬ 


tion 4>{x)=f{x)lk then <f>(x) satisfies all the conditions for a probabi¬ 
lity function. 


Ex. 3.2.4. Given that 

f(x,6) = 8e- ex for 0< x<oo and 

f(x,0) = 0 elsewhere, where 6>0 is a constant ; can this be a 
probability function ? 

00 00 CO 

Solution, f f(x, 6)dx= O-fj 6e- e <*dx=$j e'**dx( since 

0 0 


6 is a constant) 
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e 


r- 


e-oxjQ 


o 


= 1 . 


*» Spec^eof *“* 

ealle p '. ^ ls example there is one parameter 0. In 

general a probability function may be denoted by fix, 9) where 9 
denotes ail the parameters in the given probability function. Here- 
after this general notation will be used for probability functions 
whenever it is convenient. If the parameters in a probability func¬ 
tion are given en the probability function is completely specified. 
For example if the above example was given as, 


f{x)=5e~ 5 ' x for 0<a:<oo 
an ^ /(#) = 0 elsewhere, 

then there is no parameter or f(x) is completely specified. For 
various values of 6 we get a number of probability functions, all 
having the same functional form. In general, we can say that 
f(x, 6) denotes a family of probability functions. 


Exercises 

3-5. A consignment of 30 electric bulbs contain 10 defective ones 
A random sample of 12 is selected from the set. If X denotes the number of 
detectives m the sample find (1) the probability function for X, (2) the distri¬ 
bution function for X. 

3-6. Can the following functions be probability functions ? If so, find 
the corresponding distribution functions. 


(a) 

(b) 

(°) 

(d) 


f(x) = 



1/4 for «=1 
1/2 for x=2 
0 elsewhere. 


/(«) = 



1/3 for *= —1 
1/3 for *=0 
1/3 for ®=5 
0 elsewhere. 


/(») = 


{ 


Zx for 0<*<1 
0 elsewhere. 


/(») = 



x for 0<&<1 
2—x for l<^s»<2 
0 elsewhere. 


by b a - P r °klem 3*6 represent the discrete distribution, if there is any, 

any a ^ < “ a 8 rams and histograms and the continuous distribution, if there is' 
by a curve. Also sketch the distribution functions in these cases. 
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3 - 8 . Evaluate lc, if the following functions are probability functions. 

k/ 2 for x=0 
kj 4 for x—2 
fc/4 for a?=»5 
0 elsewhere. 

*(1—£c») for Q<x<l 
0 elsewhere. 

2fr0e _6a; for 0 <rc< oo where 0>O and is a constant 

0 elsewhere. 


(a) 

/(»)=• 

(&) 

/(*)= 

( c ) 

f(x) = 


] ^ 


3-9. If a stochastic variable X has a probability function f(x) as gi Ven 
elcw, evaluate and illustrate graphically 

(а) P{ x ^ 3 }, 

(б) P{ | s | <l-5>, 

(c) P{ l<x<3}. 

for 0<x^l 

& 

? — for 1 <x^2 

4 

/(*)= ^ J_ for2<a; ^3 

for 3<x<4 

I 4 

0 elsewhere. 

3*10. If a stochastic variable X is given by 


/(») 


'1/4 for x=—2 
1 /4 for x=0 
1/2 for x=5 
0 elsewhere. 


Evaluate and illustrate by a histogram 


3-11. 


(а) P {x < 0}, 

(б) P {x < 0), 

(c) P { | * j > 2}, 

(rf) P {0 < x <10). 

If the distribution function F(x) is given to be 



2 x*l5 

— 3/5 + 2(3:c—a*/2)/5 
1 


for 0<x«^l 
for l<x^2 
for x>2 


Find 

312. 


the density function f(x) and sketch f(x) and F(x) 

If j{x) is given as, 

1/8 for x=— 2 
2/8 for x= —1 x 

f(x) =^ 3/8forx=0 
2/8 for x«=2 
0 elsewhere. 


a* 
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evaluate (1) ^{x), (2) F(0)~.;F(_1), (3) P {a > 0}. 

tion whichhSfferent P robability , aensity f ™°' 

1 <®< 2 , 2<cc^ 3, 3<a< co, the blowing intervals, —» <*<1, 


^ ^ W ° exam Pl° s °f a distribution function of a continuous 

^• V,X Whl °. aS < J lfferent functional forms in successive intervals, of the 
following intervals. — oo <a;<0, 0<®<^2, 2<x<co. 


3.3. MATHEMATICAL EXPECTATION 

The mathematical expectation of a function 0(X) (where <b is 
a, Greek letter called psi) of a stochastic variable X is defined as 

E[i|>(X)]= 2 0(aj)/(cc) when X is discrete 

— CO < X < oo 
00 

= j ^ ( x )f( x ) dx whenX is continuous. (3.9) 

-- GO 


Here *(X) is a function of X, like X 2 , X, X(X-l) etc. Any 
function of a s.v, X need not be a s.v. But in our discussion we will 
consider only functions which are s.v’s. Whenever there is no 
confusion E[i//(X)] will be written as E i p(X). E[0(X)] is read as ex¬ 
pectation of i//(X) where E denotes 'mathematical expectation’. E 
may also be considered to be an operator like the differential 

operator D, the integral operator j etc. This operator E plays a 
vital role in statistical analysis. When ^(X)=X, 

E(X)= E x f(x) when X is discrete 


i 


= f * /(*) dx when X is continuous. (3.10) 


OO 


E(X) is sometimes called the mean value of X. E(X) is a con¬ 
cept of average which may be seen from the following example. 

Ex. 3.3.1. A discrete s.v X takes the values X\, rr 2 ,..., x n with 
probabilities , ..., — and other values with probability zero. 

What is E(X) ? 

Solution. By definition 

®(X)*&1‘—+32’ — + •••+«»!' — 

=(*i + x 2 4" •••+*«)/ n • 


introduction to statistical mathematics 
132 Consider the following situation. A person geta a 

sum of money eguao ^ the faces marked 1,2,3, .g ? 

the face wen a a num ber of times. In the _ long run, i. e 

s ,1* number of trials tends to infinity or approximately, when 
tufa^ieTreated for a very very Urge number of times, how much 
mo Jy can he Ipect on the average per game ? 

Solution. 

When a balanced die is rolled the numbers 1, 2, 3, 4, 5, 6 can 
come with probability 1/6, 1/6,-1/6. If the number 4 comes at a 
Irifll he eetsS4 2 = *16. When the game is repeated 
a lfreenumbe^of times the relative frequencies of l, 2, 3,-6 
approximate to 1/6, 1/6,...1/6. Therefore the total amount of 
money that he gets on the average is 

= $(l z X t+ 22 x t+ —+ 62 x 4-) 
== $i(12 + 2 2 +3 2 +4 2 +5 2 +6 2 ) =$15.17. 

Comments. It is easily seen that X is a s.v., which is de- 
fined as the square of the number rolled then the amount the 
person gets is E(X) = X xf(x)= l 2 xJ+2 2 x£ + ...+6 2 x£=15.17. 

From this example it is seen that E(X) is a concept of 
average. 

Ex. 3.3.3. Find E(X) where X is defined as follows 

r 0 for x < 0 

for 0 ^ x < 1 


/(»)= 




Solution. 


^ * for 1 < x < 3 
0 for x < 3. 


E(X) = | x.f(x)dx=> jir.O tfa-j-js.a: dx+ J* x* ^^— dx 


00 


00 


00 


+ 


| x.O 


dx 


0 •- 1 

ahnnt 3 ^' 1 ' Mom ! nts ' E(X) is usually called the first momei 

Greek letter^11^ and ^tTf by 01 ^ 

mu ancl Pi is read as mu one prime). 

U - E(X)= Mi i= m . 
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In r . moment about the origin (sometimes called 

(called mu /prime 8 ). ** E(Xr) and is u3Mlly denoted b y 

/i/=E(X r ) = 2 x T f(x) when X is discrete (311) 

— ® < in < co 


! 


00 


— I f[%) dx) when X is continuous 


00 


i.t., First moment about the origin 

=/*i'=E(X). 

Second moment about the origin=ja' 2 =E(X 2 ) etc. 

Theorem 3.3.1, E(c)=c where c is a constant with respect 
to a 5.W.X. 

Proof. By definition E(c) 

00 

=J* c . f(x)dx when X is continuous 
- 00 

CO 

= c . [ f(x)dx 


= c . 1 (since J f(x)dx—1) 

— oo 

= c. 

The proof when X is discrete is left to the reader. 

Theorem 3.3.2. E [c . </>(X)] = c . E [^(X)] where c is a 
constant. 

Proof. E[c . ^(X)] 

=2 c . ip{x)f(x) when X is discrete 

— OO <aj< CD 

=c 2 *P[x) f(x) (since c is a constant) 

- 00 <#< CO 

=c E[i^(X)] (by definition) (3*12) 

The proof when X is continuous is left to the reader. 

Corollary. E[cX] = c . E(X)=c . n'i where c is a constant. 

(313) 

Theorem 3.3.3. E(aX+6)=® E(X)-)-6, where a and b are 
constants. (3*14) 
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Proof. E(rtX-|-6) 


oo 


= | (ax+b) f(as)dx when X is continuous 


00 


QO 


= j ax . f{x)dx+ 1 


bf(x)dx 


00 


00 


oo 


00 


X 


=a.J X . f(x)dx -f&J f(x)di 

- OO - 00 

(since a and b are constants) 

=a E(X)-f 6, 

The proof when X is discrete is left to the reader. 

Corollary 1. E[a ^(X)-f b ^>(X)] =a Ei/'(X)+6E0(X) where 
a and 6 are constants, tj>(X) and ^(X) are functions of X. 

Corollary 2. E[X-E(X)]=E[X-/ 1 ]=0. (3-15) 

3*32. Central Moments. The r th moment about a point c may 
be defined as E(X— c) r where c is a constant. When c is E(X)=/t 
then the r th moment about E(X) is obtained. This is usually called 
the r th central moment and is usually denoted by fj. r . 

/* r =E (X—nY 
where /*=E(X) 


OO 


=J (x—n) r f(x)dx when X is continuous 


=2 (x—i*) r f{x) when X is discrete (3*16) 

— co <£C< oo 

For example 

/*2=E(X— n) 2 

=E[X 2 — 2/iX+ n 2 \ 

=E(X 2 )—2/x,E(X)-fE(/x 2 ) 

=E(X 2 )-/* 2 . (since E(X)=j* and Efi 2 = ^) 

(3-17) 

1*2 is sometimes called the variance of the stochastic variable 

X [Var (X)]. The positive square root of the variance (t.e., V 1*2 ) * s 
called the standard deviation of the stochastic variable X and is- 
usually denoted by <7. 

_ J /£ 

a=V /i a =[E(X -fi) 2 ] ' 
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={E[X_E(X)] 2 } l / 8 = [Var(X)] 1 / 2 . 


E(X _T y Th b : “tlXMeSlonlf T d , n0t be equal to 
me an square deviation when the square devfatTons^fteken'from 
DlA) — P* 

Theorem 3.3.4. Var(aX-|. b)=a* V*v(X\ i t 

constants. ' i ' ° ' Var ( X >- where <* “ d & are 

Proof. By definition 
Var(aX+6)=E[oX+6- «E(X) - Ms 
=E[«{X-E(X)}]2 
=E a 2 [X—E(X)] 2 

-o 2 ( since ° is a constant] 

_ • Var(X) (318) 

CoroUary. Var(X+6)=Var(X) where 6 is a constant. 

Theorem 3.3.5. /*,= «' —( r \ „j_( r \ , 

r ^ l ■ f'+l 2 ) f*/_, . /a!i_ 

r +(-l)*V’where /* is the r» central moment and n’ r is the 
* th raw moment and ft is the first raw moment or E(X)=/x=^ . 

Proof. By definition 

^ r = E[X— H] r 

w ^ ere /i=E(X) and r is a positive integer. 

But (X-rt'=X'-( i ) X-x . m+( [\ X-V-. 


f«r-E(X—p) r =E(X’')—^ J ^ /». E(X r -*) 

( 2 ) ^ • E ( Xr ' 2 )-.+(-l)' E(l) 

e-x • l*+( 2 ) ^■•- 2 • f> 2 


+ ' 


I )'*’ 

-.+ (—l) r M r 

CoroUary 1. 

Corollary 2. ft= M ' s -3. ,.+3^',. 

— p'z—Sil'i . p-(-2/x 3 


(3-19) 


(3-20) 


mi 

i^Ste"oCLg y s P e e o S ti°on“° ment8 WM ° h wiU be briefl ? d » 
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, „ Absolute Moments. The r»* absolute moment M, 

i ^i d r£o?v.“"“.sis, 's.w'srrc 

moments about c, are 

Mi=E | X-c | (3-21) 

M 2 =E | X-c | 2 (3-22) 

M 3 =E | X-c | 8 etc. (3 23) 


o - I ' 

_i.pt, r is E(X) = ^ the absolute moments are said to be absolute 
moments about y. For example the second absolute moment about 

pisM 2 =E,X—M 2 =E(X—,) 2 


(It may be noticed that (X—/x) 2 — (/* X ) 2 |X /d 2 since 

we are dealing only with real quantities.) 

The first absolute moment from c is called the mean absolute 
deviation or mean deviation of the s.v., X from c. 

Mean deviation (abbreviation, M.D.)=Mi=E jjX-c | 

It is evident that Mi need not be equal to V M 2 - 


Ex 3 33.1- Find (2) the mean deviation from p=E(X), (2) 
the standard 'deviation, (3) the third raw moment (4) the third aho - 
lute moment from p, for the following probability function. 



for x=0 
for x=l 
for x=2 
elsewhere 


Solution. /x=E(X)=0xi+lxH-2xi=l. 

(1) The mean deviation from /x=Mx=E | X—p | 

= | o-i | x±+ i l-i I xH I 2 — 1 I x v 
= s i+0=i=l/2. 

(2) The standard deviation = yj /x 2 

= [E(X —p ) 2 ] 2 

fl2 =(0-l) 2 Xi + (l-l) a xK(2-l) 2 Xi 

=i+i r = 1 / 2 

The standard deviation 

g=VT/2=1/V 

(3) The third raw moment 

= /*3' = E(X3) 

=0 3 xi + l 3 xi+2 3 Xi=5/2 
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/ 1 —M3 = E I X — jfl | 3 


(4) The third absolute moment about )i =M 5 =_ 

I I 0-.1 I 3 x£+ | 1-1 | 8 X £+ I 2-1 13 X £ 

- 1 x i+0xHl 3 xi=l/2. 

Ex. 3 * 3 j'2* Find ( 1) the mean deviation from 6 (2) standard 
tuyfuMUm 3 ^ SeC ° nd TaW m ° mmt for following probabi- 

/w-4 

elsewhere , where 6>0 and is a parameter. 
Solution. (1) By definition the mean deviation from 6 is 


E I a- 




~Q dX 


0 


T 


411 - 


6 I dx 


o 


Butz<0 always ; therefore | x—Q | =6—x. 

6 9 

E | x — Q | =-i- j (0—a)dx=i-[^ dx ~ 


o 


and 


= 4 [ *’-y] = >/2 = */2. 

(2) Standard deviation 

= VS={E[X-E(X)] 2 }1 

e 9 

E(X) = j x — dx=-^\x dx = 0! 2. 

0 0 

ft a =E(X—0/2) 2 =EX 2 —(0/2) 2 

E(X3)=J ,3 I d,=[-L 4] = «3 /3 


0 


6 2 0 2 

'‘•-T'T =9V12 ' 


0 


The standard deviation 


= V^2=V^ 2 / 12 = 0/v/12 
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(3) The second raw moment 

=E(X 2 ) 

=6 2 /3 


Theorem 3.3.6. 

e( 

><1 
3 1 

■p 

II 

o 

and 

Var( 

x ~M=i 


V 

<* 1 

where 


/a=E(X) 

and 


G — VH-2 


X- u 

Proof. Let 

a 


E(Y)=e(^=^ ) = 

= L E(X—/a) = [E(X)—E(/i)] 

= i_[ /i _ A t] = 0 (3-24) 

a 

Var (Y)=E[Y — E(Y)] 2 

- e r x ~^—o 

= df E(X—/i) 2 =~- [ Var (X)] 

= ~T = 1 - (3 25) 

A stochastic variable whose expected value is zero and whose vari¬ 
ance is unity is called a standardized stochastic variable. 

3*34. Factorial moments. The r th factorial moment of 
a stochastic variable X is defined as the expected value of 

X(X—1)...(X—r-)-l) and is usually denoted by 

/ z lr]=E[X(X —1)(X —2)...(X —r-f 1)] 

CO 

* !)•..($——1— 1)( x)dx 



when X is continuous (3-26) 
= X x{x~l)..'(x-r + l)f(x) 

— ob <£< oo 


when X is discrete. 

• ^ ^ n ^ secon d factorial moment of the stochastic 

variable X whose probability function is defined as 
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f(x) = f -- for 0 <x<6, 9>0 is parameter 
L 0 elsewhere. 

This is Tcnown as a uniform or rectangular distribution with 
one parameter. 

Sol. By definition the second factorial moment 

[i[ 2 ]=E[X(X 1)] 

00 

= j x(x-l)J(x)dx 

— oo 
0 

='J dx 

0 

0 

= -i- j (a : t —x)dx 
0 

1 r 0 2 ~|_ — - — 

= TLT“ TJ“ 3 2 

Co mm ents. /u [2] may be expressed in terms of p' 2 and p\ 

W2] =EX(X-1)=E(X 2 -X)=EX 2 —EX 

= p\—Pi 

p'z — p2 + {piY 
pz — P[2] + Pi — (Pi) 2 
= P[2] + P — P 2 

where p= Pi =E(X) (3*27) 

Ex. 3.34.2. Find the first and second factorial moments for 
the stochastic variable whose probability function is given as 

f \X 

f(x) = \ e~“ x for x~0, 1, 2,...oo, X >0 is a parameter . 


0 elsewhere 

This distribution is known as Poisson distribution with the 
parameter. X ( lamda). 

Sol. The first factorial moment is 

P[l] = Fi(X.) — y,x — p 

P[i\= Z xf(x) = Z x A- e ~ x 

— co <£< CO £ = 0 ^ * 

1 00 A® 

= e- x Z a A 

a:=.l x • 

(when a=0 the corresponding term is zero) 
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= <3 -X .A . 2 


A*' 1 


= e 


—x 


x=l C*- 1 ) 1 

r A A 2 A 3 


, , . A A a A 3 1 

A L + n + 2i+3r + - J 


(3-28) 


= e -X . A . e x =A. 

^sinee « A=1 +J7+^f +••• ) 


The second factorial moment 

/ i [2]=EX(X—1) 


03 


A* 


= 2 x(x—l) — r e -x 

a = 0 


= 27 
*=2 


x(x — 1) 


A® 
£ ! 



(The terms corresponding to x=0 and l are zeros) 

, 00 >*-2 

= A 2 e“ x 27 —-_ 

x=^2 (% — 2) I 


= A 2 e x e x =A 2 (3 29) 

Comments. It can be easily verified that, for discrete 
stochastic variables, factorial moments are usually easier to 
evaluate. 

A stochastic variable X together with its probability func¬ 
tion f(x) is sometimes, called the probability distribution of the 
stochastic variable X. This is evidently different from the dis¬ 
tribution function which is the cumulative probability function. 
For example Ex. 3.34.1 may be stated as follows. Find the 
second factorial moment for the following distribution 

X : f(x)=f-L for 0 <x<6, d>0 
(. 0 elsewhere 

This means that the stochastic variable X has the probability 

function/(a?) which is defined as above. In this distribution 

(stochastic variable togther with its probability function) there 
is one parameter 0 . 


We defined various types of moments. But these moments 
always exist A moment is said to exist if it is a finite 
quantity. Consider the following example. 
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Ex, 3.34.3. Evaluate E{ 2*) for the probability distribution 


X : f(x) = 


Sol. By definition 


f F /or 

l® elsewhere 


E(2*) = S 2* f ( x) 

— co < oo 

= S 2 * i- 

1 <^£c^ co ^ 


-2x 2 + 22 X 22 +. 

= 1 -I - ! +1“I - ••• =oo (3 30) 

Evidently the series 1+1 +. does not converge and there¬ 

fore E{ 2 X ) for this distribution does not exist. 

Comments. A series ai+a 2 +--- is said to be convergent if 

“1” 0*2 "p •©» & n *"“&S 7&—>oo 

where 1c is a finite quantity. Otherwise the series is not conver¬ 
gent. If 

+ + oo &s 7£->oo 

then the series is said to be divergent. Therefore depending 
upon the probability, distribution, moments may or may not exist 

3.35. Moment Generating Functions. If X is a stochastic 
variable then E(e { X) is called the moment generating function 
(abbreviation M.G.F.) of X, where t is an arbitrarv real constant 
with respect to the stochastic variable X. It wilfbe seen in the 
following discussion that this M.G.F. gives the various raw 
moments and hence it is called a M.G.F. It is usually denoted 


M x (l)=E(e‘x )=E [l + (X + |.| +JL X»+...l 

— E(X)+2"j E(X a )+... 

=l+« f*,'+ L w ' + ... (3-31) 

i.e„ the coefficient of -j- in M x (() gives the r»‘ raw moment 

"efficients^ “ W m ° ment9 . are « ‘he 

t t 2 i 3 

l~y > 2 \ * 3 ~j Mx(f) exists. 
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This M.G.F. M x (0 exists if the^ser 

i+<#*'+ h. #4|#+ . 

is convergent ior some ^ Binomial probability 

Ex. 3.35.1. Obtain tte j Binomial probZ 

distribution with ^ e P^ a ^ ^ ^ £ s g{ ven as 

bility distribution with the parameters JS ana p y 

f{x, 0 ) = j( f° r X== *’ 


[o elsewhere 0<p<l 
Here 6 represents the two parameters N and p. 

Sol. M x (f) for the Binomial variate X is 

M x (*)=E(e*x)= E e tx . V {1~pF 

x=>0 / 


Let 


= E ( N ) (peW-pf x 

x=0 \ x ) 

1 — p=q and pe t ^p f then 


(Since (q-\-p')^ when expanded by the Binomial expansion gives 
the sum above) 

= (?+/) N 

= {q-\-pe t )^. (3*32) 

Theorem 3.3.7. M x+a (£)=-e <0 M x (<) where a is a constant 

or the M.G.F. of a stochastic variable Y=X fa is e ta times the 
M.G.F. of the stochastic variable X. 

Proof. M x+o (0 = Ee* (X+a ) = E e tX + ta 

= E e tX e ta = e ta E e tX 

(since e ta is a constant) 
M x (f) (3-33) 

Corollary. M x-A t ) — e ~ iV ‘ Mx(0 where /i=E(X) = /V (3*34) 

This gives a relationship between the central moments and 
raw moments. 


Theorem 3.3.8. y r '=— r M x (£)|£ = 0 


(3-35) 


or the r th raw moment is obtained by differentiating M x (£). r times 
with respect to t and substituting £ = 0, or p r is the r th derivative 
at t= 0 of M x (0* 
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The proof is left to the reader. 

Hint. Consider the series M x (t) and differentiate. 

Ex. 3.35.2. Obtain the first moment p=E(X) from Ihe M.G.F. 
in Ex. 3.35.1, 

Sol. The M.G.F. in Ex. 3.35.1 is seen to be 

where g = l_ p 

d 

fa M x (f) = N(g -fpe*) N- } pe l 


jf M x(*) I *=0 

= N(^+j9) N “ -1 _p (since e*=l when t= 0) 
= N p (since q-\-p = 1) 

/X1 '=^=N p. (3-36) 

Comments. Similarly other raw moments may be easily 

obtained by successive dilferentiation of (<z-J- 2 >e*) N with respect 
to t. 

Theorem 3.3.9. 


Proof. 


= M x (tfa) (3-37) 

where a and b are constants. 

By definition, 

M oX+&(0 = E e^ oX + 6 ) =E e taX e th 
=e tb E e ta • x 


= e tb M x (ta). 

- tp 

Corollary. Mx ^ * ( ) — e c M x (</o) 


(3-38) 


This gives a relation between the raw moments of 
standardized variate and that of the original variate. 

Sometimes the M.G.F. M x (() ot a X may not exist 
but in such cases another function called the characteristic 

funct,on fe<«) of a .... X exists. The characteristic function 
<px[t) ot a s.v. X is defined as 


^x(0= E e i4 x where — i 

and t is an arbitrary real constant. Here also it may be seen 
that the coefficient of in the expansion of fo(f) gives the 
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1 , , , t) also generates the moments. The 

rtb raw moment ft- • 0 **', exist always and it also generates the 

characteristic functm 

raw or crude mom • . var i 0 us types of moments, 

So far we ha 7 e . ^Vnction of a s.v. X. For a given 
M.G.F. and oharactenst f d may be evaluated, if M x(i ) 

bability function, Mx(*J “ t i, e question whether there exist, more 
exists. It is “ at «l t0 f r o tion corresponding to, a M.G.F. or a 
than one probabil y ? answer is no ! The M.G.F. and 

characteristic fnacMm . determ ; n e a probability function, 

characteristic fun ^ reader may refer to Mathematical 

For a proof of this result r The uniqueness theorem 

Methods of Statistics y ■ ability distribution, if the M.G.F. 
Drives the uniqueness of the proDaon y 
or the characteristic function is given. 

Ex. 3.35.3. The M.G.F. of a probability distribution is given 

tn he — (24-e*) 4 . What is the probability distribution ? 


to be gj (2+e') 4 


(2+e')‘= (4+r «*) 
\ * 


This is of the form (q+pe‘) N where ^—2/3, P—1/ 3 »nd:N 4. 

Since the M.G.F. uniquely determines^ oones^^ pr a l- 
lity function, the corresponding distribution is a Bm°m - ' 

bution with the parameters N=4 and *=1/3. (See Ex. 3.35.1) 

Exercises 

3 - 15 . A balanced coin is thrown 100 times under similar experimental 
conditions. What is the expected number of heads ? 

3 . 16 . A balanced die is rolled. If a person receives «« when the 
number 1 or 3 or 5 occurs and loses $8 when 2, or.4 or 6 occurs. How muon 
money can he expect on the average per roll in the long run . 

3 - 17 . The probabilities that a man fishing at a particular place 
catch 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 fishes are 0.00, 0.20, 0 06 0.04, 0.03, 0.0^, 0.02, 
0.01, 0.01, 0.01 respectively. What is the expected number of fish caught . 

3 . 18 . Suppose that the probabilities that sets of 1, 2, 3, 4, 5 persons 
come to visit a particular art gallery are 0.20, 0,50, 0.20, 0.07, 0.03 respective y. 
What is the expected number of persons per set ? 

3.19. Prove that (1) E(X—c) 2 is a minimum when c=E(X) = p 

(2) E I X — c I is a minimum when c=M==median) 
’ 1 (See section 3.43) 

3 . 20 . Find the expected value and standard deviation for the follow¬ 
ing distributions. 

(1) /(#) = r 1 /8 for x= —2 

\ 2/8 for £ = —1 
/ 3/8 for o?=0 
I 2/8fora;=2 
V. 0 elsewhere 


/(*) = 6 


—6x 


for 0 < 


0 elsewhere 


oo where 0 > O is a 

constant 
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3.21. If cumulants or semi-invariants k l9 lc 2 , ... of a probability dis¬ 
tribution are defined as log M 0 (l)« 1 + *,1+ Jc^j2 ! + ... or log 0 a .(f)^l + fc 1 (ie) 
4Jfc 2 (^)*/2 ! + •*• whenever M 0 (t) does not exist, where log denotes the natural 
logarithms, show that the first two cumulants are such that fc 1 =|jL=E(X) and 
Jk 2 a= p, 2 =»Var(X). ^ lc ly Jc 2y ... are called semi-invariants because except for k x 
all other k *s are invariant (does not vary) under a translation of the variable. 

In other words, k 2 , k Zy ... for the s.v y X are the same as those for the s.v y X+c 
-where c is a constant.) 

3.22. The factorial moment generating function of a probability 
-distribution is defined as E(^) where t is an arbitrary real constant. 

That is, 

F a ,(«)=E(f X ) 

Show that the r ih derivative of F x (l) with respect to t at t =1 gives the 
I'M factorial moment. 

3.23. A Cauchy distribution is defined as, 

/(#) = 1/ti( 1-|-# 2 ) for — oo < x < co 

Show that E(X) for this distribution does not exist. Does M a (f) exist 
for this distribution ? [See also bibliography (6) ]. 

3.24. The following are some moment generating functions. Find the 
-corresponding probability distribution by using the uniqueness result. 

(a) M^) = (l /2 + 6V2) 1 0; 

(b) M x (*)=(l+2eV/3* ; 

(c) M a .(<) = (2 + 3e<)3/125. 

3.25. Obtain the moment generating function of the following distri¬ 
bution. 

(%)c= ex for 0 < x < 1 
4 2— x for 1 * < 2 

10 elsewhere. 

3.4. SOME USES OE MOMENTS 

Moments are usually used to specify a probability distribu¬ 
tion (this may be noticed from the uniqueness theorem), to locate 
a specified point, to measure the scatter or dispersion, to measure 
symmetry or skewness (lack of symmetry) and to measure Kurtosis 
or peakedness in a probability distribution. Some of these uses 
■are discussed in the following notes. 

3.41. Points of location. If a statistical population is 
specified by a stochastic variable X or by the probability distri¬ 
bution /(re, 6), we may be interested in a point, say c, such that 
P{a?;^c} =p% of the total probability where c and p are constants. 
This point c is a measure of location in the sense that c locates 
the p°/ 0 point in the population. 

3.42. Percentiles. The ^>th percentile point is that value d 
■of a variate such that P{z<^=^% = (0'01)P- where p and d are 
constants. For example the first percentile point, say p x is such 
“that P{z<;Pi} = 0 01. By this notation p 10 is such that P{a;<j9 10 } 
= 0-1. Piq,Pm,... are called the decile points. p 2 5, P 6 o and p- 5 
are called the quartile points and they are also denoted by Qi, 
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Q 2 and Q 3 respectively. That is, Qz is such that P^q, 
—This is evidently a measure - of central tendency 0 f 
the distribution. Q, is also called the Median of the distribute* 

Ex. 3.42.1. Find the Median of the following probability 
distribution 

r2x for 0<x<l 

/(*>=] . 

elsewhere 


Sol. By definition, if M is the median, then 
P{a:<M}=P{a:>M} 

But P{z<M} = F(M) 


IVI LYl 

— | f[x)dx= | 2 x dx 


i 

= 2^=M 2 = j 2xdx=\- 


M 2 = 1/2 or M=l/\/2 

(* is always positive here. Hence the negative root is not 
admissible). 

Comments. It may be noticed that if we are given a set 
of numbers or observations then the median of this set may be 
defined as that number for which the number of observations less 
than it is equal to the number of observations greater than it. 
For example among the numbers 1, 7, 25 the number 7 is the 
median. In a probability density function, that is when X is 
continuous, 

P{rc < M} = \ =P{;c > M}. 

Some idea about the quartiles is obtained from Fig. 3.12. 



gTOCHASTlC VARIABLES 10T 

3«43« Measures of Central Tendency. It is seen that 
is a measure of central tendency because Q 2 =M=Median, is sucb 
that P{#<M} = P{a;>M} and when X is a continuous s.v. then 

P{aj<M}=|=P{ a: >M}. 

Other measures of central tendency are the mean and the mode. 
E(X) = ja» i s called the mean or the mean value of the s.v.X. or of 
the population designated by X and is a good measure of central 
tendency. E(X) may be called the centre of gravity of the pro¬ 
bability distribution. In a discrete distribution, if the s.v. takes- 
the values *i, x 2 ,...x n with probabilities p x , p 2 ,...p n respectively 
where Zp i =p l -\-p i -}-.-Hp„=l, 

then E(X)= Z x iVi = Z * H 

i =1 2 Pi 

may be considered to be the centre of gravity of the system of 

masses pi, p 2 , . p n at the points x\, x 2 , . , x n respectively. (See 

Fig. 3.13). 


Xf X 2 x 3 




Fig. 3.13. 


Another measure of central tendency is the mode. Mode may be 
defined as that value of the variate which occurs more frequently 
or that value of the variate corresponding to a maximum pro¬ 
bability. For example if adisorete s.v. takes the values 0, 1, 2, 3- 
with probabilities 1/4, 1/2, 1/8, 1/8, then the mode may be taken. 



Fig. 3.14 (a) 

as F The continuous distribution in Fig. 3.14 (a) has only one 
maximum point and is called unimodal. If a distribution has 
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one maximum point it is called multimodal. See p. 

; n Fig. 3.14 (6), d x and d 2 are the modes, and the corref] 


more than 

3.14(6). In - - e . —.i 
ponding distribution is bimodal. 



Fig. 3.H ( b ). 

Ex. 3,43.1. Find the mode or modes of the following dis¬ 
tribution : 


/(»)=—4^ e _a;2/2 for — oo<#< 
\2k 


00 


Sol. 


f(x) — — e~“ x ~l 2 

dx '\/2 k 

= 0=>x = 0. 


The mode is 0. It may be noticed that 


\Z2tc 


-* 2 12 i 


is a 


curve symmetric about the ?/-axis and the only maximum is at 
x = 0. 


Comments. The mode of a given set of numbers or obser¬ 
vations may be defined as that number which occurs most fre¬ 
quently. For example in the following set of numbers 1, 2, 3, 3, 
3, 4, 4, 5 the mode is 3. 

3.44. Measures of Dispersion. A statistician may be inter¬ 
ested in the scatter in a population. Scatter or dispersion may 
be broadly classified into three types : (1) the extent to which the 
elements are dispersed, that is, the range of the population, (2) 
the scatter among individual elements, (3) the scatter or dispersion 
■of the elements from a point of reference. For example consider 
two townships A and B. Let the average income of the citizens in 
A be $10,000 a year. If the average income of the citizens in 
B is also $10,000 a year, from this information alone we cannot 
say that the two townships A and B are the same as far as the 
income distribution is concerned. In A there may be a few 
millionaires while the majority may be poor. In B perhaps every¬ 
one has an income around $10,000 per year. If we have a measure 
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of the dispersion or scatter nf ,•_ * ■ 

cinnooin A and r r 01 the lnc °mes from this average 

• B ^ a We Can sa ? something more about tne 

income distribution m A and B. Range and interquartile ranee 

Ottered °The^ T& t . he , e ^ tent to w hich the individuals are 
scattered. The Range is defined as the difference between the 

largest and smallest value that a discrete takes irith non! 

zero probability or it is the total range of a continuous' , „ wh!re 

,. jy . luca ‘ c i> .. x n where x,^.Xi^l .. <o 

interval(0,*!)° theiTfche^Ranve*is^l!!*)) 0 !! "q fa £?“<? “ft 
the interquartile ranee L ' ( Q» -Qi)/2 is called 

mdividualsweuse Gini°s me^ndiffetLT 

A itawt'foffurth °- ( SMiU j u T o1 - 1 by M.G KendaU and 
’ er information about the different measures. 

The me . an deviation from a point m may be taken as a 
measure of dispersion from the point m where m is a aiven 

...... deviation from m m.y .]*, bl t»)L 

persion from m. In particular when !r asure dl3 ' 

izSz&p I s ^'Tzrrx* 

that ||d ||=l if I* m i =1 f J; A JO Hlth ‘ he condition 

measure of dispersion or scatter from the point m ’in'Ih^T r 
observations Xl , .... Standard deviation mean deviant, ? 

Wo populations usually a measure called the coefficient^"" 18 

vVtlir a v de ”f ° n)/(mean) - if is non-zero^ and !f Gie 
s.v s take positive values, is used. ana 11 the 

. ®x. 3.44.1. Find the mean deviation and mnt * 
deviation from 4 for the following distribution. ' Square 


/(*) = 


' 2x 

Y J or °<%<3 


(.0 elsewhere. 

Sol. The mean deviation from 4 is equal to E I X -4 


CO 


— | I ^—4 | f{x) dx 


— oo 
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3 



U 


X—4 



dx 


3 



U 


(since x<4, 4-x is always positive) 


- 9 I 2 3 J 

0 

= 2 . 

The root mean square deviation from 4 is 

= {E(X-4) 2 }i 
3 

But E(X-4) 2 =J (x-4) 2 -| xdx 

0 


3 






{ E(X-4) 2 }i=V9/2=3/V2. 

Comments. Similarly other measures of dispersion from 
the point 4 or from E(X) or from any other point may be evaluat¬ 
ed for this population designated by the s.v., X with the 
probability function as defined in this example. 

3.45. Measures of Skewness. If a probability distri¬ 
bution is symmetric then it is easily seen that the odd central 




Skewed to the right 

(a) 



(b) 

Fig. 3.16. 
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moments /*i, ^3» are all zero. So the odd central momenta 
may be taken as a measure of skewness or lack of symmetry in 
the probability distribution. To be independent of the units of 
measurements, yja 5 ,... may be taken as measures of 

skewness where a is the standard deviation (ct^O is assumed). If 
an odd central moment is zero this does not necessarily mean that 
the distribution is symmetric. So these different measures are 
measures of skewness only to some extent. Different types of 
skewed distributions are shown in Fig. 3.15. 

Ex. 3.45.1, Examine the symmetry and evaluate the coeffici¬ 
ent of variation in the following probability distribution : 




for x=2 
for x=3 
for x=4 
elsewhere 


Sol. E(X)=2xi + 3xi + 4xi = 3 

*' ^3=(2—3) x£-f(3 —3)x* + (4 — 3) x£ = 0. 

Similarly all the odd moments /x 5 , — 0 

The distribution may be considered to be symmetric 

/x 2 =E(X—3) 2 = (2-3) 2 x -H(3-3)2 x H( 4 - 3)2 x \ 

= 1/2 

CT =V/*2 =V2. 

/. The coefficient of variation 


— 1 

/X ~ V 2 x3 

1 


Comments. For continuous distributions 
be checked in a similar fashion. 


symmetry may 


Kurtosis. Kurtosis or 
istribution is usually measured by 


peakedness of a probability 


T2=— 2 -3. 


(where Y is a greek letter called g amma) 
is , ^ auss i a n or Normal distribution yJo* = 3 This + -i . 

611 as a sta „dard to measure Kurtosis.' Distributions ‘ for 
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which y 2 =0,> 0, <0 are called mesokurtic, leptokurtic and pi at 
kurtic, respectively. 



Fig. 3-16. 

However if Y 2 =» >0, <0 the shapes of the curves need not be as 
shown in Fig. 3.14. So y 2 does not tell much about the shape of 
the distribution. 


Exercises 

3.26. Give one example each of a probability distribution which is 
(J) symmetric, (2) skewed to the right, (3) skewed to the left. 

3.27. Evaluate, (I) Median, (2) Second decile point, (3) the centre 
of gravity=E(X). (4) the range, (5) the interquartile range, (6) the standard 
deviation, of the following distribution. 

f(x)=jl/Q for 0<a;<0 and 0>O is a parameter. 

(_0 elsewhere 

3.28. The following are some important inequalities, given for the 
information of the reader. The reader may try to prove them 

(1) E | X | <{E | X | r} l /r forr>L ; 

(2) {E | X | ry/r J 1 X | 5 p/ s for 0 ; 

(3) log E | X j r is a convex function of r. (A function h(x) is convex 
if h{ax+$y)^.oih(z)-\-ph(y) for all x, y, a>0, (3> 0, af(3 = i) ; 

(4) If /t(X) is a convex function of X and if E(X) exists then A[E(X)] 

<E[/i(X)j. (This is known as Jensen’s inequality). ' 

3.5. CHEBYSHEV’S THEOREM 

In section 3.44 we had seen that the standard deviation may 
be taken as a measure of dispersion from the expected value. 
Now we will prove a theorem due to a Russian Mathematician 
Chebyshev (also spelled as Tchebycheff) which will throw some 
light on the importance of the standard deviation in statistical 
analysis. The theorem states that 

P { I *—J* | >ha}<-- 2 (3-39) 

where k is an arbitrary positive constant, /r = E(X) and o is the 
standard deviation, i,e. } the probability that the absolute value 
^ Is greater than k times the standard deviation is less than 
1 k 2 or in other words the probabilit}^ that x is less than /x — ka or 
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the Pr ° bability 



ii—Jc^^r^hn^p 0 the theorem the probability that* falls below 

Fig 3 is lesl tha^. 18 ^ th “ ^ ° r the 9haded *» 




IX —/ca oo 

P{ I x-n I >ka}= | Mdx+ ^f(x)dx<~ 

— 00 + 


if X is continuous 


Proof. Let X be a continuous s.v. 

00 

cr 2 = E[X—E(X)] 2 = | {*-?)• f(x)dz 


V-—ka H-+&CT 

= j (%-P) 2 f(*)dx+ | 


CO 


\x — k(j 


00 

+ J ( x —p) 2 f{%)dx 

p-| -Jcg 


(3-40) 


—ko 


CO 


u 2 > 


| (^n) ! /(*)*!+• J 


... J • 

**V Hit in nthe, interval 

r •**. ••• 

"t I 


- OO 

4 ** 


H + &C 7 


^+ ka > «). 

| *—jit | >ka^(x—y.f>k 2 <s‘‘ 


(3-41) 
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jp— ho “ 

a 2 > jk 2 a 2 f(x)(fo+ J k 2 ic 2 f(x)dx (3 


- 00 

i—ha 


(*4-ka 


42 ) 


V. 


co 


-L > f /(*)<&+ | /(*)<fo=P{ | X-n | >hj 

— oo fi-j-ha 


The proof when X is discrete is left to the reader. 

For a more r rigorous statement and generalizations of the 
theorem the reader may refer to Mathematical Methods 0 f 
Statistics bv H. Cramer and Linear Statistical Inference and Its 
Applications by C.R. Rao. This inequality is very important due 
to many reasons. The proximity of a s.v, to its expected value, 
measured in terms of a measure of dispersion, namely, the 
standard deviation, is given by this inequality. The exact value 
of P{ | x—fM | >ka} can be evaluated by knowing the distribution 
of X. But here no emphasis is put on the functional form of the 
probability function of X. Hence this inequality may be consi¬ 
dered to be a ‘distribution free’property, in the sense that the ~ 
property does not depend on the distribution of X. Instead of 
using the standard deviation we can use any other measure of 
dispersion;and obtain the corresponding inequalities. 


Ex v 3 . 5 . 1 . The 'probability of survival in case of a particular 
disease D is found to be 0 80. One hundred people are attacked by 
D in a particular area . If X njptes the number of survivals, 
assuming that Xfollows a BinorriicCt distribution with parameters 
N=100 and p=0 80, find ( 1 ) an upper bound for the probability 
that the number of survivals will be either less than 68 or greater than 

92 ; (2) a lower bound for the probability that the number of survivals 
is between 86 and 92. 


Sol. (1) When X follows Binomial distribution, 

N v 

“ (1— p)®-* 


/(*)=l 


x 


p 


X 


it 1 can be 8een from Ex - 3.35.1 and Ex. 3.35.2 

S tnda^ X S i o a n nd VariaDCe ° f X ^~P) and hence the 


I , 

Here 

and j 

Therefore 

and 

or 


a=[Np(l — _p)]£ 

N=IOO, ^=0.80 

l-_p=0.20. 

Np=n= loo x 0 . 80 = 80 

o 2 =Np(l- : p) = ioo x 0.80 x 0.20 = 16 

u = 4. 
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ut 80 4-3x4 = 92 and 80—3x4=68, 

thatlS '. ,M+fo/=92' and f«T,for-68wkere k=3. 

— According to Cljebyshev’s inequality,,..., ..«, „ 

V ' I I x —| 

that is, P{ | x —80 | >12}<l/ftV 

or, , P{a;<68 or z>92}^.y9., 

(In this example, since the population is>'known to be Binomial 

we can calculate the exact probabilities instead of the approx 1 

mate probabilities given bv ine’nSm'fii-L 1 , ’'t , 

df Binomial probabilities). - >“fq^ties, rf we have a table 

Hence an upper boupdfor the'required probability is 1/9. 

( 2 ) P{68<af<92}^l l Lip{*.u' 6 ^ or x>92}>1 _ J_ 

where - . •••* ..'.jfc4=3j - l'[ 

The required probability limit 

Vue " ,To77lt/?fy§/ 9 ‘ 




(3*44 )> 
/ 


Comments. It may bo noticed that 

c -' P{ bgfo}>i_ 4 . 


For W W w efe ff' afey ? d fcou^-Chebyshev’s inequality. 

T?I t~}' 2 ' 3 ";vT get . the various probability statements about 
the closeness of X te /I in terms of dll " ., a 

v ^ B , Mn 

Find k ?’ $‘ 5A il is 9&n that P{ | x^\ >k}<0-25 

1 SoL ^fey Chebysbev’s theorem tk 

P{ | x-n | >&a}<* 

k? 

If k is replaced by L then the inequality may be written 
as / 


P{ I x-n \j>k}<jp S 


(3*45) 


We are given that P{ | x —/^f >*} <0.2^. 
.\ k may be obtained by lakij^g' 


a 2 

4T -0.2S 


\ 


< : 


or 


= x~7- =2a=8 ( 
Cro 
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Comments. It may be noticed that Chebyshev’s inequality 
as stated in the above theorem does not give the least upp er ; 

bound for the probability that | x-fi [ > « a- Still smaller upp er 1 

bounds for P{ | x-p | >^} may be established. This will not be 

discussed here. 

V' 

Exercises 

3.29. If X follows a Binomial distribution 

f(x, 6)= ^ ^ ^ p X X where q=\—p, £>=.0.4, N=10, 
find k for the following problems. 

(1) P( | x— t* I >fta}<0.26, , 

(2) P{ | x — ji [ >/c} <0.5,'where [j.=E(X) and a is the standard devia¬ 
tion of X. 


3.30. If X follows a Binomial distribution with the parameters N and 
p specified as N=40 and p =j /2, obtain lower limits for the probabilities, 

(1) P(l*-H <io>, 

(2) P{|*- F | <20>, 

( 3 ) P{ I * I H <30), 

by using CheLyshev’s theorem and also explain these limits in words. 

3.31. A Gamma distribution is defined as, 
f[x)=x CL ~ 1 e~ x jT{ a) for a;>0, a>0 and/(:r)=0 

elsewhere, where T(a) (gamma alpha) is defined as 


CO 


r»= 


a — 1 —x . 

x e ax. 


0 


It can be seen that r(«) = (a —1 )T(a—1) and n. Obtain inequalities 

for the following probabilities by using Chebyshev’s theorem. 

(1) P{ | rr-H- | <2 (t), 

(2) P{|*- F | >2o), 

(3) P{ j x— (i | >2), where n = E(X) and o=the standard deviation 

of x. 

3.32. Prove the following inequalities : 

(1) P{ | x-v. [ >/c for 1 ; 

(2) P{ | x— n | >/c) <3 r jk r for 1, where (3 r =E ] X— | r . 

[Hint. Proceed in a similar fashion as in the proof of Chebyshev’s 
inequality]. 

3.6. COMPLETENESS 

We denoted a probability function by f[x, 6) where 8 stands 
ior all the parameters in the probability function. We have also 
seen that if the parameters are specified then the probability 
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distributioVw^h^ 6 '^ S P ec ^ ed ’ ^ or example, in an exponential 
distribution with one parameter the density function is, * 


{x, 0 )=j^-Lj e ®/0 0 <x<<x>, e >0 


0 elsewhere. 

no^arameters ~ 4 ' 2 T e g f fc two ex P one ntial distributions with 
i nSKSSfl f 03r ™ m Pj^ly specified). In order that fix, d) be 

X words^rf 10n J he 0al l restricfcion on 0 is that ^>0. In 
various nn« a iM S a c 1 0nstant which is defined as O<0<oo . For the 

distributions 1 * iT?!, 1168 0i ?’fi X ' Q) defiaes a family of exponential 
meter it wUl definedf 6ra,1 i lf /motion has a para- 

all the distribution?! ■ a + ? 1 ^,° probability distributions, namely, 

same functional f tbe probability functions having the 

same functional form except for the value of the parameter. 

we ofte^nee^^f tlC +- the ° r y of estimation and related problems 

zXTeiJT^ e :i tb \r th x - aay * (X) - snch tha? e 

that the expected lf k A f !*“” exlst tw0 functions such 

not unique Si^nW 0 ^° th th ° funotio “ are B then +(X) is 
simplefrom wtwolfe “ d P" fct “g ** * - 

?fTheT° f T i0n - Say> * {x) Such ^t e 5(X;=0 ! slnte r EW 8t 0 a 

utaqutcu^the probabilitvT* ‘?AU)=0 theT^i £ 

that E^(X)=0^> a te;l y o “ r ° f ^ e X ta9 tte property 

parameter in thtprobability^fun^ioatf°X.^ 388 ^ 6 TO “ the 
Let ^eTcontinuouf 1 ^function 

meter 6. If Ec£(Xl=0_ nr*’ » e pendent of the para- 

Ihen/(f^ probability ^ero^ 

{aa ° tioa 

are eomplefeY’ “*“**" ^ fM ° Win< > P^abiUty measures 

<«)**)=(*«)-*/* .-V, _ M< * <00 . 

(b) fix. P) = (2u P *)-V* <a:<00i p>0 

(e) fix. «)-(*«)-!/* <a<x 

Sol. (a) Consider the function ^(*) 
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oo 

= (27c) _1 / 2 J x e~ x *l 2 dx= 0. 

•• .— . . It' 

Evidently x is a non-zero continuous function of x and hence f^ X ) 
is not complete. 

00 ■ "">•' d. .v - ' 

(&)JE(X)=(2tu P 2 )- 1 / 2 j. * e- xt l 2 ^ dx=0. 

— oo r ~ * v 

Irrespective of the value of [3 there exists a non-zero function 
independent of (3, namely <j>[x) = x, such that E^(X) = 0 and hence 
f[x, (3) is not complete. 

(c) It can be shown that 


(2JI)- 1 / 2 f ftx) e-<*-“) 2 / 2 (fc 


t* t =0=>(f)(x)=0 i /* ; * *• 1 ' 

almost everywhere. The proof is beyo’nd the 1 ' sdo^e df Ifhis 'book! 
(One method of proving this result is by using the uniqueness of 
Laplace transforms.) 

*’ Exercise 1 ’' f « 1 -• 

3.33. Construct 2 examples of a family of probability function which 
is not complete. V 
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CHAPTER^' \i j. 


SPECIAL UNIVARIATE DISTRIBUTIONS 

i** *' * ‘ • 

4.0. Introduction. Here some of the most- commonly used 
univariate (one variate) probability distributions-will be discussed. 
In chapter 3 we defined probability distribution^ and a general 
notation f(x, 6) was introduced where 6 denotes all the parameters ■>. 
in the probability function. For convenience some of the impor¬ 
tant univariate discrete and univariate continuous distributions 
— are given in tabular forms. In the later sections these distribu¬ 
tions and some of their important properties are individually 

discussed. ' v .... 

• \ • 

4.1. DISCRETE AND CONTINUOUS PROBABILITY MODELS 

In practical situations where the behaviour pf a particular 
characteristic (say, one stochastic variable) is, fonder study, 

1 we may not know the appropriate probability distribution for-the - 
situation under consideration. By examining the experimental con¬ 
ditions, such as the possible outcomes of an experimental trial, the 
probability of an outcome, independence etc., we may be able to set 
up an appropriate probability model. Once we have a probability 
model or a probability distribution the experimental results can 
be studied .in greater detail. In sections 4*11 and 4*12 some of 
the most frequently used univariate probability models are given 
and the special experimental conditions for which these models 
are appropriate, are also discussed in later sections. 


4.11. Discrete distributions. 


>- --- 

Name 

Probability function, f(x, 0) 

1 ‘ 

Parameters 0 

!• The Binomial 
i distribution. 

: ( N )p*(i rt N_a: fo r: ?-o, i . ,n 

and f{x, 0)=O elsewhere. ' / 

1 •’ i * 

(N, p) 
0<p< 1 

N-positive 
integer. 

The Hyper- 
geometric distri¬ 
bution. 

s 

c 

1 

* 

r° l 

(a, b, c) all 
positive 

y . r V . . * 9 ” 

( n 

integers. 

i 

V 

\ n / 

and 0 elsewhere. 
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Name 

Probability function f (a?, 0) 

Para *neter n 

3. The Poisson 
distribution. 

X* e - */® ! for x=0, 1, . co and 0 elsewhere 

*>0 

4. The Negative 
B inoniial distri¬ 
bution. 

99 

P) X ^ or . . 00 

and 0 elsewhere. 

and 0 elsewhere. 

(p, k\ 
,°<P<1 
“’Positive 
integer. 

jj 

5. The discrete 
uniform distri¬ 
bution. 

\/n for x equals some x l9 x 2 ,.. x n and 0 

elsewhere. 

(W) 

^-positive 

integer. 

6. The discrete 
Geometric distri¬ 
bution. 

0(1 0)*' 1 for a?e=l, 2, .. a? and 0 else- 

where. 

n (6) 
°<9<1. 


4.12. Continuous Distributions. 


Name 

Probability density f(x, 0) 

Parameters 0 

1. The Uniform 
or rectangular 
distribution. 

f[x, 0)= for a <»<P 

= 0 elsewhere. 

(«» P) 

a>0, p>a 

99 

f[x, 0)= -h for O<.r<0, 

= 0 elsewhere. 

(®) 

0>O 

2. The Expo¬ 
nential distribu¬ 
tion. 

1 X 

f(x,Q)=— e 0 for 0<a;< co, 

! =>0 elsewhere. 

0) 

0 >o 

*3. The Gamma 
distribution. 

/<«. e,_ 1 «-”/?, 

P a r(«) 

for 0 < x < oo , 

■=>0 elsewhere. 

(«. 

a, (3>0 

99 

fix, 0)=. ^ - x e' x for 0<a;< oo 

= 0 elsewhere 

(a) 

a>0 


*T(a) (Gamma alpha) is defined as 

00 

r>/ . f a — 1 —x , 

T(a)= x edx 

0 

where T is the Greek capital letter Gamma. It can be proved that H*) 
an d r(l/2)=’\/ r Tr. It a is a positive integer r(a) = ( a “~*) ' 
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Name j 

Probability density f (x, 0) 

Parameters 0* 

*4. The Beta 
distribution. 

/(*» "r / B . * (! «) P 0<x<l 

J>(a, [j) 

=0 elsewhere 

(a, j8) 
a, p>0 

**5. The Cauchy 
distribution. 

J{ *’ 8) ” ”<*<" 

(0) 

— co <; 0 < Co- 

»> 

7r[iH-(x 2 -0)i’ a> < ;c < 00 

(.) 

6. The Gaussian 
or normal distri¬ 
bution. 

/K 6)- „ e - 2p>- “<*<” 

Py 

*1 

8 

8 

1 


£ 1 __ x ~ 

0 )“ P/27T 6 2 P 2 f - 00 <*< 00 

0) 

p>o 

99 

1 (a— a) 2 

/(.T, 0)= 6 2 , — CD <£ < OD 

V 2tu 

(«) 

— co < a < oo 

99 

//,v _ 1 -® 8 / 2 

J\ x ) -7—r e , — cd <# < oo 

^2* 

(■) 

7. The Pearson 
curves. 


(a, b, c, d) 

8. The student’s 
t distribution. 

/(*. 0)= — MrAf!+ x) ^ 2 

VfoT r(C* V- * / 

for — °o <a;< °o 

{k) 

^-positive 

integer 


*B(a, P) (beta alpha beta) is defined as 

1 

B(a, P)=, | x” -1 (1-x) P_1 dz, 
0 


where B is the Greek capital letter beta and (J is the Greek small letter betn. 
It can be proved that er oeta 


B(«, /3)= -iT 1 
P) r(a + P) 


bution. 


**( • ) means that fchere is no parameter in this probability distri 
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Name 


Parameters 0 


9. The Chi-square 
distribution 



f{z> 0) = 


2 fc / 2 X 


m- 1 e -xj2 


for 0<a;< oo ; 
=0 elsewhere 


10. The F-distri- 
butive 


f{x, 0) 




/_/ m \m/2 




/my 

V2 -) 


x {mj 2)-l 


=0 elsewhere. 


C'*t) 


m+n 


, <*> . 

^-positive 

integer 


(m, n) 
m, w posi¬ 
tion inte¬ 
gers. 


The Pearson ian system of curves will give a number of pro¬ 
bability distributions for various values of a, b, c and d. This 
may be seen from an exercise given at the end of section 4.8. The 
differential equation in 7 gives the density function^. The range 
of x may be determined by the axioms for a probability function. 
The reader is advised to learn by heart the parts “involving x in 
the various density functions together with the “range of the 
stochastic variables. The constants in the various density func¬ 
tions (the parts independent of x) may be determined by the 
property that the total measure is unity, 

•V - i r 

oo y'f 

| f(x, 6)dx= 1. 

- 03 

or 2 /;«•,<■>= 1 . 

— CO <#< 00 

4.2. THE BINOMIAL DISTRIBUTION 

This is the most widely used univariate discrete distribution. 

The probability function is 

1 ¥ 1 

f(x, 0) = ^ j p* q^~ x where q=l—p 

and N and p.axe parameters' i.e., for a given Binomial diS'tri-* 
bution N and p are constants. Eor different values of N and p ' 
different Binomial distributions are obtained. So a Binomial dis¬ 
tribution is completely known if'-N and p are given. 

This distribution may be easily obtained by considering an 
■experiment of repeated trials. Suppose that we are interested in 
ending out the probabilities of getting exactly 3 heads in 10 trials 
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A f ho°sDita g l aiwlT+r? + T in ’ ge 1 ttin .g. 5 b °y s among 20 babies born in 
ihosnital is a bnv i & i probability that a new born baby in that 
triven that fh** ^ 1S u Vv ^ our ou ^ ^ pneumonia patients dying 
Sc Einn^l Pr0b t b L 1 -v y Qf deathin case of pneumonia is 0.01 
l' n\ q f • P robablllt y l aw may be applied in situations 

r^nhabilL t ? al results m either a success or a failure, . (.2) the 
°f occurrence of an event (we usually, oall.this.the 
P 7 0 a success m the sense, favourable to the event under 

i tl tl0n ' F ° r exam P le if we are interested in the , event of a 
death then a success is a death etc.) remains the same from trial 

0 ri , J l ^ ria ls are independent. If these three conditions 
are sa is e then the probability of getting exactly a; successes in 

N trials may be seen to be ^ "j p x (i-^N-s wliere p is the 

probability of a success in any trial. Suppose that the first x 

tnals are successes and the remaining N-a; are failures, tben 6 the 

probability of getting the $rst, a? sjjc^esses and the remaining N~*. 
failures is #*(1—p) N x . (fiincerall the trials are independent the. 

II I 18 * be . ''P r °duct of individual probabilities). Suppose 

t at the first trial is.a failure, the next x are successes, and the, 
ones are failures. The probability for this is q p x . 

Q ==P x whpre q—l~p. , So if we specify any subset 

x 01 trials resulting in successes and the remaining trials result in g 


in failures the probability for this is p x 
x elements from a 


,N—a> 


But a subset of 
set of N elements may be selected in 


ways. Therefore the probability of getting exactly x successes in 
N trials is ^ \<p x q^~ x . 


It is seen tha£ the Binomial probability law applies in situa¬ 
tions where; " ' 

(1) Anjf’ frial results in a success or a failure ; 1 

(2) There are N repeated trials which are independent j 

(3) The probability of a success in any $ri£i) is p K 

Such a situation may be called a Binomial probability situation, 
li ;we harve a situation different from this .then the Binomial law is 
not applicable Some other probability distribution may be found 
■out. bo in a Binomial situation it is seenithat. 


q N x for x—0, 1, 2,...N (4.1) 

elsewhere. 0<p<l, q-^l-p i 




Thjs is called, a Binomial distribution because the probabilities of 

°’ 2 > 3 '-v snecesses in N trials may be obtained as 

rs , second, third,... terms of the Binomial expansion 
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<,+,)*- ( N 0 >° ^-°+( 1 )^ N ' 1+ - 

+(*)*>* * N_ * + . + (*W 

But 5 = i and therefore 3 +p = l- 

i=( N 0 y^- o +.+ (*y« 


(4.2> 


N-x 


+ ••• 


/N 

+ VN 


) 2> n e° 


(4.3) 


<e the prohabilities of getting 0, 1, 2 ... Successes add up 
to unity This may be expected because the event of getting » 

*° ? Jr \ or. or N successes is a sure event. It may be noticed 

that the Binomial coefficients are symmetric in the sense 


(5)-(” Hi 

But the Binomial probabilities or the different [terms of the 
Binomial expansion of [q + pf need not be symmetric. They are 
symmetric if p=q=1 /2- If p^=q then 

( N 0 y s n -° .-(n ) /v ; y « N_ v (/_,) ? N_V et °- 

Ex. 4.2.1. Find the probability of getting ( 1 ) exactly 3 heads,. 
(2) at least 3 heads, {3) at most 3 heads, if a coin is tossed jive times, 
assuming that the probability of getting a head in any trial is lj3. 

Sol. This is a Binomial situation of repeated independent 
trials. According to our notation N = 5, p — 1/3. 

In(l) x=3, in (2) x = 3 or 4 or 5 and in (3) a; = 0 or 1 or 
2 or 3. Therefore the required probabilities are 

(S)(t)'(-})*-«“• 

b 1 

== 243 

"(SXiniHtXiHiWSXbW 

ami)' 


-f 


= 1 - 


[Probability of getting at least 4 heads], 
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-~[(XM+(;x-i> , (i) r ]-s 

#•=5*3*. Saw kS-t= 

lability distribution if the various probabilities of getting 0™ 

2 . successes are represented by a histogram § we set a 

P N “ ; ** “ T“eBino m lTpToll!;r t ie‘fort P r, 0 o f u: 
r/Z b a o e ot tabalated - A “ of ‘ h ° *«• 

4.21. Moments. By definition 

E(x, =1* • (!) ^ 

(when x=0 the corresponding term is zero.) 

AT 

N ! 


Put 


N 

s a—_ 

x—i x ! (N— x ) ! 
N ! 


P X q*~ X 


N 

= s 


X 


i (^-1) ! (IN-rc) ! P Q 


N-x 


N 


= N .p 2 


(N-l) ! 


x=l (*—1) ! [(N— 1) — (aj _]jj j 


l 


P 

x -l =y and N — l=n 

U I 

E(X) = Np. 2 V- y 
y= 0 y ! {n-y) i ^ 


N-l-^-i) 


, n ~y 


n 


n 


,y n n ~y 


P q 


= N p 2 , 

y = ()V y 

N *(?+*>)" [Since (ff+p)" when expanded 

gives all the terms in 


i’ ( ) 

y = 0 \ V ) 
= Np (since 5+^ =1) 

E(X) = Np = p. (say) 

Var (X) =E[X—E(X)] 2 =E(X 2 ) —/i 2 


P y Q 


n~ y j 

(4.4) 
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N 






N 

+ E X “ ' p x n N ~ X: 

[Since x 2 =x(x— 1)+^ 

= »f * ! (N—*) ! pX i N ~ X +l l 

N (N— 2) I 

= N(N - 1) n? 2 FwVi 

P x ~ 2 ? N -* +„. 

Put N—2 = 71, and x — 2=y. Then the above equation is 


N! 


N 


=N(N-l)p 2 2 f n ) p y q n ~y +ti 
y—0\ y / 

= N(N-1 )p 2 {q-\-v) n + H 
= N(N-1 )p 2 f t i 


E(X 2 ) =N(N—1 )p 2 +Xp (4 

Var (X)=±E(X 2 )—jLL 2 =N(N—l)p 2 +Np-^-(Np) 2 
v =Np—Np 2 =Np(l — p) 

=Np2 (4.6)) 

The standard deviation, 

' v (S.D.)=V N M (4.7), 


E(X)=Np means that on the average we can expect Np successes^ 
For example, if the probability of death from a particular disease- 
is 0.01, and if 1000 people are affected b.y this disease, we can. 
expect Np =1000x0.01=10 deaths on the average, in the sense 
that even though there may not be exactly 10 deaths in a given,' 
set of 1000 patients, but if we consider such batches of 1000 
patients then on the average there may be 10 deaths 1 per 1000 
patients. The variance in this case is Npg = 1000x0'01 x0-99 
==9-9 and the standard deviation or a measure of dispersion is, 
(1000 x 0*01 xO‘99) 1 / 2 . v 1 


4.22. Moment Generating function. 

M.(»)=Ee*X 


= 2 e x ( N ) p * 

0<a;<N 
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= s 


( a .)(p«r q s '~ x 


. = (q ["Since (q -fpe*) N when expand- 

, , L ed gives all the terms in 

.!-■ 2 f 3 ! W«‘r- S N -*1 f*- 8 ) 

0<*<N \ x ) J 

.. For the Binomial distribution the M.G.F., 

M a .(0 = (g , +j3e < ) N . 

The various raw moments may be obtained by differentiation 




d 


pt j-N" jppt 

N-l„- 


^-M a .(0|^ =o =N(g+^) N - 1 2 ?=Nj? (Since q+p=l) 

, ■ v (4.9) 

E(X) =fju=fii = Np 

d 2 

=Nj)( 9 +j>) n - 1 +N(N-1)^ (3+j) )N-2 


= N^> N( N — 1) p 2 . 

The variance of X=c 2 = /a 2 '—/*/2 

=N^)-}-N(N—l)p 2 —(Np ) 2 
=Np—Np 2 
=N^?(1—^}) = N pq 
The standard deviation 


(4.10} 


c=\/]$pq. 


(4.11) 


Higher order moments may be obtained by successive 
differentiation. 

Ex. 4.22.1. The moment generating function of a s.v, X is 
given as 


Mx{t)= 256 ( 3+ef)4 * 

find the probability function of X. 

Sol., 


M *W=2Sb( 3 + c, ) 4 = i(3+«‘)‘ 


256 

3 

4 


=t —+■— e'J. 


This is of the form* {q+pe*)* where g =3/4 and p = 1/4. 
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... By the uniqueness theorem, the corresponding distribu- 
s Binomial with probability function ( ^ )(l/4)*(3/4)*-*. 


In the Binomial law, the stochastic variable X is the number 
of successes in a Binomial situation. If we consider the proportion 

of successes 5- then the expected value and variance of the 
stochastic variable Y= may be easily obtained as, 

E(Y)=E (|) = irE(X) 

(Since N is a constant with respect to X) 


N p 

= N=P- 

Var (Y;= i Var (X) 


(4.12) 






[Since Var (aX-[-6) = a 2 Var (X)] 

(4.13) 


PJ_ 

N 


The standard deviation of 
Y=(p?/N) 1 / 2 


(4.14) 


It is interesting to see that the factorial moments are easier 
to evaluate when X is a Binomial variate. For example the 
second factorial moment p[ 2 ] by definition, is 

m =EX(X-l) 

= 2 x(x~l)(^) p x q N ~ x 

0<ai<N V * ) * 

= N(N-l)p 2 (4.15) 

Similarly p.[ T j for r= 1, 2, 3,... may be easily evaluated. 

4.23. Recurrence formula. For convenience of evalu¬ 
ation of Binomial probabilities a recurrence formula may be 
obtained as follows. 


f{x, d ) 

f(x+h 0) 


_N, ! * N—* 

x ! (N—k) ! & 2 


N 


_' 1 x— 1 

(®+i)i (N-x^ryi p . ® 


(* is replaced by x-f.1): 

' ^ v'r~44 '• t’ * 






SPECIAL UNIVARIATE DISTRIBUTIONS 


129 


. /(®+l. 0) N— X p „• 

~ 7 ^ 7 er =:s 7 ^T * q (ol>t amed fe y cancelling 

out all the common factors) 

f { x+i, e)= ^=£ . JL j (x ,e) ( 4 . 16 ) 

From the formula if we know f( 0, 6) and p we can evaluate 
/(l, 6), f(2, 6), . . 

W- f /( °’ #) 

=N|- /(0, 6 ) ■ . 


But 


/(°. «)=(^ y s n -° = S N =(i-i)) N . 


Therefore if/(0, 0)and^ are given, N is obtained and thereby 
/(l, 6 ) is obtained and 


f{2, 0) = (N-1)/(1, 0)/2. 

0) is obtained. Proceeding like this all other 
probabilities are obtained. For computational purposes the 

Binomial coefficients ^ p x q®~ x for various values of N, p 

and a; are tabulated. Extracts of these tables are given at the 

e 5 d . tbls book - References to more tables are given at the end 
of this chapter. 


i 

I 


Exercises 

firwi+i.?' 1, k F u-.^ Bi ?° mial distribution with parameters N=5 and u=0-3 
fc° < exact]y < 3 > failures!^ ^ atle “‘ 3 ■«—*<»> at most 3 

f ,2 ‘ Find the probability of getting (a) exactly 4 heads, lb) atleas* 
tails, (c) atmost 3 heads, when an unbiased coin is thrown 6 times. 

,, Assuming a Binomial probability situation with »=l/2 whnf 

the probability that out of 20 babies born in a hospital 16 are b?ys ? 

, Sho " that forp=l/2 the Binomial distribution has a maximum 
t®-N/2 when N is even and at ®=(N—1)/2 and (N+l)/2 when N is odd. 

4.5. The probability that a bomb hits a target is given to be O-RO 

inesTv^V fc l in °^ i u 1 sit . uation ’ what is the Probability that out of 10 bomb-* 
go exactly 4 will be misses ? 

answer,, 4 ;!' . In ‘! ga ™ of taking a chaoe, a contestant has to give correct 
With f to 4 OUt of 5 <l uestl0ns to wm the contest. Questions are given 

^vrer^?" ea ® h ' ou * of , wbi . ch °“® is a corre + ct an swer. If a contestant 
hilitv+1 •'h® questions by selecting the answers at random, what is the proba- 
y tnat he will win the contest ? v 

^hat 4 '^‘ ^° r binomial distribution with parameters p and N show 


d 

l L r+ilPQ = ^- r ' l t r-i + "^HT' 
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where * is the r* centre, moment .nd J-1-*. Find * and N by being thie 

recurrence relntmn thot 0 „,y 10% of the birds of a parti. 

4.8. A natund [ a ®' . ra( jio-active fallout:, A testing service tries 

cular category are affected by radio ac le of 30 birds. The testing 

to check this claim by ch g birds in the sample are affect 

service wiU accept the claim. f.onyO or lore m probabilities that m 

?•%.**• »irde are affect^ 
(2) the service will reject the claim if the claim is true . 

4 9 A manufacturer of razor blades claims that only 4 % of the blade 
do not meet the specified quality limits. A customer will accept purchases 
of 12 blades ifOor 1 does not meet the quality limits j otherwise he will 
return the purchase. Assuming that this is Binomial probability situation, 
and the purchase is a randpm sample of size 12, what ar 0 t j he probabilities 
that (1) he will return the purchase-wheh the manufacturer s claim is true, 
(2) he will accept the purchase when actually 5 / Q of the manufacturers 
razor blades do not meet quality limits* 

4.24. Poisson Distribution. The probability function for 
the Poisson distribution is seen to be (section 4.1), 

f(x, 9)= A* e~ y -/x ! 


for 


x — 0, 1, 


QO 


and f(x, 6)= 0, 

elsewhere, where A (lamda) is a parameter and 

A>0. (4.17) 

The Poisson probability law may be obtained as a limiting 
form of the Binomial probability function 

> 

/<*.«)*(*)/ 4*-. 

Let us consider the siutation when N->oo , p ->0 but N p remains a 
finite constant A always ; i.e., the number of trials is very very 
large, the probability of success in any trial is very very small 
but Np=A is a finite constant. This situation may be called & 
Poisson situation and we will show that in this situation the 
Binomial probability law tends to the Poisson probability law. 

For the Binomial distribution 


where 

If 


j(x, 0)= - ' ( N ~ x + 1 p x 


X I 


q=l-p. 

Np=A then P= 

N 


f(x, 0) = — N(N—1)...(N—z+1) 

X i 


* N - 
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J_ N (N —1) 
~x\ N 


(N—ar+1) 


when 


and 


and 


f), 

: ' ‘ ' ( l “ fH 1 -fH < 4 ‘ 18 > 

N^-co , (;1- 

(-I)-,/V 

(Since ; x is finite) 

(l A Vi? -X 

V~NJ. -+ e 

* * ' ut 

pSince lim fl-f- —-.Vase 4 ”] 

r ‘ L 1 .n-^oo \ • n,J- J 

( i- x K ■* i ’■ •' 

(Since x and A are finite quantities) 

'. v 


f(%, #)-> — e * 


(4.19) 


where 0 represents the only parameter A. 

possibl 7 thai n !ht h6 ^ function it is 

conditions for a probability funotiorfor the liXgXm'tS not 
be a probability function. Let ns examine whether A -1 ,-x is 

< J > /(*)> 0 
CO 

( 2 ) | /(*)&=1 

- 00 

0r s /(*)_! 

— 00 <£C< 00 

Herfi A® _•> ■ 

a; ! e is evidently >0 Since aj=0 12^ j 

a, z, and 

A is a positive quantity (A=%). 
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r =e' X E 

o<*<»* ! °<*< 


= e 


r a a 2 , i 

L 1+ n+2T + -J 


= e ~ x e x =1 

^since =1+ jjf+|y+ ••• ) 

. e - x =/(») is a probability function. (4.20) 

x i 

This probability function is known as the Poisson probability 
function and we will denote this by our general notation 

f(x, 6)—^-r e~ x for x=0, 1, 2,... 

nr. i 


where 


=0 elsewhere. 

A>0 is a parameter. 


This was first introduced by a French mathematician called 
S. Poisson and hence it is called the Poisson distribution. 

4.24.1. Moments of the Poisson Distributions. 

E(X)= £ *-£ e~ x = S * ^ e- 1 

oo x • l^a;<co X ' 

(when x—0 the corresponding term is zero) 

i A *- 1 

= A e~ x S - 

i<sc< oo ( y i)! 

=A e “"[ 1 + n + 2 T + -] 

= Ae~ x . e x =A (4.21) 

The second factorial moment 

M[2 1 =EX(X-l)=E(X)^-E(X)=^' 2 _ fl 

00 A® 

= s x{x—\) — e ~ x 
a;=0 * i 

=A 2 (see Ex. 3.34.2) (4.22] 

Var (X)= At2 = / 4' 2 _ /x 2 

= [^[2] + Al] — /X 2 

=A 2 -f A—A 2 =A (4.23 

The standard deviations-^/A. 


( 4 . 22 ) 


(4.23) 
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For a Poisson distribution the mean=the variance=A. 
The reader may evaluate /* 3 and /a 4 . 

4.24.2. The moment generating function for the 
Poisson distribution. 

M J! («)=Ee iX 


CO 


= 2 e tx 

a = 0 




x 


—x 


= e 


-x £ W) x 


— 


x=0 % • 

Ae* 


- }+ m + ^ + 


2 l 


■i 


=e x . e Xet 
= „x (e*-l) 


(4.24) 


By differentiating 

M x (t)=e^ et ~ 1 ) 

with respect to t the various moments can be easily obtained. 

The probability of getting exactly x successes in a Poisson 
situation is 


/(*. •)-» 


-x 


A recurrence relation may be obtained since 
f(x+l, »)=JLf (x , 6). 

We have seen that a binomial distribution approximates to a 
Poisson distribution when N^oo , p-+0 but N^=A, a constant. If 
^->0 the probability of a success in any trial is very very small. 
So the Poisson probability law is sometimes called the probability 
law fof rare events. This law may be applied to situations like 
death due to snake bite, delivery of quintuplets, emission of alpha 
particles, traffic accidents etc. Since when N->qo,^-> 0 and 
JN^=A, is a constant, the Binomial law becomes a Poisson law in 
practical situations we evaluate the probabilities by the Binomial 

torero 7 When ^ 1S n0t sufficientl y lar S e and V or q does not tend 

A hospital switch board receives an average of 4 

that n™th CallS in \ 10 minute in te™al. What is the probability 
m ^ at th \ most 2 emergency calls in a 10 minute inter - 
( 2 ) there are exactly 3 emergency calls in a 10 minute interval ? 



^ introduction to statistical mathematics 

Sol The distribution of the number of emergency calls may 
be considered to be a Poisson distribution with mean 

=E(X)=A=4 

in a unit time of 10 minutes. 

The probability of getting exactly * emergency calls 
unit time of 10 minutes 

A* x 4* , 

=/<*•«)=*-! e =i! e ' 4 ' 

(1) The probability of getting at the most 2 emergency calls 

=Prob. of getting 0 or 1 or 2 calls 
40 i 41 42 


in a 


0! 


1! 


= er 4 [l+4+8] 


(2) The probability of getting exactly 3 emergency calls 



32 

3 


e _4 =0*1952. 


Ex. 4.24.2. Suppose that the probability of death in case of 
influenza is 0.01. If 20 influenza patients are there in a hopital 
what is the probability that exactly 2 patients will die ? Approxi¬ 
mate the probability by a Poisson distribution and evaluate the error 
in this approximation. 

This may be taken as a Binomial situation with 

N=20 p=0.01 

and £=0-99 

The probability that exactly 2 patients will die is 

= ( 2 °)( O . O 1 ) 1 ^.") 18 
=0.0158 

probab f ilities ;Bm0mial probabilities are approximated by Poisson 

then A = Ify?=20x0.01=0-2 

The probability of getting exactly 2 successes 

= *L .-X 
2! e 


jr e'°* 2 =0.0164 
The error in the approximation 

= 0.0164-0.0158=0.0006 


• • 
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Comments. For practical purposes a good Poisson approxi¬ 
mation to the Binomial probabilities is obtained for a N which is 
as small as 20 provided Np<5. Poisson probabilities are easier 
to evaluate. So when N is sufficiently large and p is sufficiently 
small a Poisson approximation may be enough for practical pur¬ 
poses. Poisson probabilities are tabulated for various values of A 

and x. A table is given at the end of this book and references are 
given at the end of this chapter. 

/ • -A- -Poisson distribution may also be obtained independently 

(i.e., without considering it as a limiting form of the Binomial 

is nbution). Consider the following experimental situation : 

. , ^ -^J 16 probability of a success in a small time interval from 

* W * 118 “■ At where a is a P^itive constant and a t denotes 
a small increment m time at t. 


• t ^ Probability of getting more than one success in this 
interval is negligibly small, (y? will assume this to be zero). 

, noo Probability of a success in interval t to t-\-At 

TW ^ de P end on the success or failure prior to tim It 
Untler these three conditions it may be shown that the probability 
of getting exactly x successes in the time t, say, f(x, t) i S P given by 7 


A* 


(4.25) 


f(x,t) = — e * for x=0, 1 , 2,... 

where A=a . t and a and t are both positive quantities. 

i ^ Setting exactly x successes in time t \ a / 

be partitioned into two mutually exclusive evento m Y 

exactly * successes in time i and then a failure in the intefvalT 
in'this LlATnoTttl Suocessesin * and a success 



Fig. 4.1. 

/(*, (^probability of getting exactly * successes in time t 
A*.t+ A()=probabi I ity of getting exactly * s „ cce8S es i„ time 

/(*-], ()=probability of getting exactly *_l successes in ^ 
Probability of a success in time t to t + A i= a 

••• The probability of a failure in this interval 

= l-a . A t. 


(We assumed that the prob. of getting 
more than one success is negligible). 
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ft*, t+A*)**/(* -1, () a • A<+/( * ’ • A‘) 

«/(*, f)+/0*- l > *) a - 0 ot. At 

(4.26) 


/(a?, *4- &*) -/(«> * )-= a[/(®-l» *)-/(*' 0] 




Taking limits when 

i- fix, t) =«[/(*-!, «)-/(*. 0] (4.28) 

Solving this diiferenco-differential equation it can be shown 


that 

where 


/<*> c x 

A=a t, 

x=Q, lj 2,... 


The proof is beyond the scope of this book. 


Ex. 4.24.3. Alpha particles are emitted by a radioactive 
source at the rate of 2 per every 5 minutes on the average. Assuming 
this as a Poisson situation, what is the probability of getting exactly 
4 emissions in 15 minutes. 


Sol. Here 5 minutes is a unit of time. So we have to find 
out the probability of getting 4 successes in 3 units of time. 
According to the notation, a = 2 and t = 3 and x =4. 


The required probability 


(M* # -«* 

x ! e 


6 4 

e~ 6 =0.1339. 

4! 


Exercises 


4 . 10 . For a Binomial distribution with parameters N=50 and p=0.1 
compare the Binomial probabilities to that of the approximating Poisson 
probabilities for x=0, 1, 2, 3, 4 and 6. 

4 . 11 . Show that the M.G.F. of a Binomial distribution approaches the 
M.G.F. of a Poisson distribution when N->®, p-*0 but Np=X is a constant. 
Hence show that the Binomial distribution tends to a Poisson distribution 
under these conditions 

4 . 12 . A machine producing electric bulbs is known to produce 1% 
defectives. What is the probability of getting 3 defective bulbs if a random 
sample of size 20 is chosen T Approximate this probability by a Poisson 
distribution. 


4 . 13 . The following moment generating functions are given. Use the 
uniqueness theorem to determine the corresponding probability distributions. 

(«) (1 /2+e*/2)», (6) (2+e*)3/27, (c) exp. 2(e*-l). 
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Differentiating p r =E(X—X) r for a Poisson distribution with 
parameter X, obtain the recurrence formula. 


|. r+1 />‘='>r-i+ -J- 


|X r . 


Find |i a and n 3 . 

. . 4 . 15 . The number of traffic accidents in a city each month is assumed 

to be a p oisson variate with the parameter X^4. What is the probability 
that (a) there are 6 accidents in a certain month, (6) there are fewer than 
4 accidents in a certain month. 

4.16. An office switch board receives telephone calls at the rate of 3 
8 P er yninute on the average. What is the probability of receiving [a) 
'interval m & ° n6 m * nute hiterval, (&) at the most 3 calls in a 5 minute 

. . ^* probability that a vehicle will pass through a particular 

Wfc 1 * ^ asma h interval of time &t is 0.1 /\t, time being measured in minutes. 

a is the probability that a traffic counting device at that point will show 

in COUntS * n a one m i nu te interval, (6) at the most 5 counts in a 

ZU minute interval. 1 

4.3. THE HYPERGEOMETRIC DISTRIBUTION 

If the binomial probability law is to be applied to an experi¬ 
mental situation then the situation should be a binomial probability 
si ua ion. If the probability of success in any trial does not 
remain e same from trial to trial then the binomial law can not 

© app le . n this case we will see that a hypergeometric distri¬ 
bution is appropriate. 

Let us consider the following situation. Suppose that w. 
have a+6 objects out of which a are of one type and b are of a 
second type, n objects are taken at random from this set of a +6 
objects. The probability that x of the n objects taken, belong to 
the set containing a objects and the remaining n-x objects be- 
ong to the second set containing b objects, is evidently 

( 8 y 6 ) 

V * )\n—x ) 


Thus 


M «)= 


for 


(?) 

(;)?,) 
~T?r 


® I. 2,... n or a and n — x^^b 
=0 elsewhere. 


(4.29) 


is called the hypergeometric distribution where a, b n are m,™ 
S®* 5 ” > for different values of a, b, n different hypergeometric 

exam 1 nl Utl0rlS ln ^ A 0b o t A lned, and f ° r s P ecific values of a , b and n (for 

comnl P ei e i a = 10j fi = i 20> ?'' =l3) u the ^Pergeometric distribution is 
ompletely specified. It can be easily verified that 
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(: )(.-*)„, 


n 

S 


- r.) 


Hvoergeometric distribution may also be considered to be 

b objects is evidently given by the hypergeometric probability law, 
ifthere are only «+-6 objects and • are taken without replacement. 

Ex. 4.3.1. In a basket of 100 tomatoes 20 are rotten. Fifteen 
tomatoes'are picked up at random one by one. (Here at random 
does not mean haphazardly, but it means that when one tomato is 
taken every tomato in the basket is given equal chances of being 
taken). What is the probability that 

(1) there are exactly 2 rotten tomatoes. 

(2) there are at the most 2 rotten tomatoes, 

(3) there are at least 2 rotten tomatoes in the sample ? 

Sol. The 100 tomatoes may be classified into two types (80 
good and 20 bad)-. 

(1) The required probability 

S© 

~(u) ' 


(2) Atmost 2 rotten tomatoes means 0 or 1 or 2 bad ones. 

The required probability 

TO, (?XS), ft 0 )© 
/100 \ + /100 \ + /100 \ 

V 15 y V 15 ) \ 15 / 

(3) At least 2 rotten tomatoes means 2 or 3 or... or 15 bad 

ones 

The required probability 

/20\/80\ /20\/80 


20\/80 
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the probability of not getting exactly 0 or 1 

rotten ones 

= 1 — prob. of getting 0 or 1 bad ones 


= 1 - 


( 


20V 80 \ /20V80\ 

ojyjsj \ 1 )\u) 


(”) ~ ( 7 ,) ' 


evalnate™™^* P f° p a . biliti . es . ma y be . completely 

using logarithms Th able of Binomial coefficients or by 

using logarithms. These computations are left to the reader. 7 

cards are selected rn rf^ shuffled deck of 52 playing cards 2 
probability #££ 2 ZsT “ What * «• 

be divided intotwo r t v,l^ e l a ! t0gether V 80 these 52 cards may 
non-aces yP 8 ' namely a set of 4 aces and a set of 48 


The required probability 




4x3 I 
52^5 


(?) 

4.31, Moments. 

n ( “ V 6 ) 
/*=E(X)= s x \ x J 

f " I 7 v--* 


1 n 

P+*) -0 * (.1.) 



(T) ~ 

(when *=0 the corresponding term is zero 

1 ” j 

PI 5 ) -> (^mis=^T(„lj 

-H&U'- 1 -*’- 1 -’ a nd »»—1=N, then E(X) ffiay b< 
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(4.30) 


(,f.(; X--.Hr) 

This may be seen by comparing the 
coefficients of x r on both sides of 

(l + *)°(l +a :) 6 = (l+a:) a+6 ^ 

a /q+6-l\ g 

^ a-\-b j \ n ~ 1 ) («+&) 


E(X)=^ (4 31) 

when X is a hypergeometric variate. 

The variance of X may be easily obtained from the second 
factorial moment ju [2 j=EX(X—1). 

( a )f M 

M[2] =EX(X-1)= 2 s(a;-l) V -? 

x=o + 


n 


= S x[x — 1) 

x=2 



n 

= 2 x(x- 1) 
x=2 


a ! 

* ! (a- x) ! 
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= a(a— 1) 2 - ( a ~ 2 ) ! \n—x) 

x=2 2 ) ! ( a — x )! 


Put a—2=A, *-2- y> m— 2=N then ^ becomes 


(4.32) 


<z(a—l) 

N 

v 

A ! / b \ 

(<Hb\ 

y= 0 V 

!(A-yHN-yJ 

V w ) 



= «(»-!) * /A\/ ( v 

^+ 4 j j,=oUjU-y) 

= g(g-l) /A+6\ 

n ) 

__ a(a — fa + b—2 
'a-\-b \ \ w __2 
n y 


f 




But 


_ a (a~~l) n(n — 1 ) 

( a +b)(a+bZ7j • 

*'M=EX(X-1) == E(X) 2 _E(X) 

E(X*) =W2j+E ( X ) 

= “fa-l)m( re _l) a.n 

But Var,X) ^ 6)(S + 6 ' ,) + W 
r (A)=/^2=E(X 2 ) — (EX) a 

= ^ z l)_n,(n~l) a n 
( a+6 H«+6-l)+ ( ^pj)- 
= «_«6 j _( ?+ 6_ b) 

. T , x (<* + *)*(«+4-lj 

ie stan <3»rd deviation is 


(4.33) 


( -2j±\ 

\ a +& / 


(4.34) 


In Ex 4 3 i <-^°na+b-l)J 

" J ° aD eXpect ‘he average 

a . n 20 y ]r, ge 

a +6 20^80"^ 


(4.35) 


bad tomatoes in ^ 20+80 

' ’ 18 seen to be E(X)“ Np! I 8 ' 
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• _ «4 ., 

a Hypergeometric situation E(X) is — n. It may be easily veri- 
fied that a Hypergeometric distribution may be approximated to 
a Binomial distribution by taking p = ~ a 7 - . In Ex. 4.3.1 if We 


a-\-b 


use,this approximation the probability of getting exactly 2 bad 
tomatoes from the lot is approximately equal to 


/15V 20 W 80 \i3 

-\ 2 J \-100 ) \100 ) 


= 0.231 > 


The exact probability is 

/100 \ —°- 162 - 

V 15 J 


A recurrence relation may be obtained for a Hypergeometric dis¬ 
tribution also. 


4.4. THE NEGATIVE BINOMIAL DISTRIBUTION 

Here we will consider the Binomial probability situation with 
a slight modification. Consider the situation where (1) the 
trials are independent, (2) the probability of success p in a trial 
remains the same from trial to trial. Suppose that we are inter¬ 
ested in finding out the probability of getting the k th success at 
the x ih trial. Here, evidently, the number of trials is not a 
constant. The required probability may be obtained by consider¬ 
ing the, events ol getting exactly k -1 successes in x—1 trials, and 

, , .... in a success ; i.e, s the required 

probability ^ 

=(*:}) p 

P ,C <f~ k for X=k, fc+1,... 

f(x, = j p* q x ~ h for x=k, A;-)-l,... 

(4,36) 

= 0 elsewhere, 0<p<], q = \— p. 
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gives the probability law, called the Negative Binomial Distribu 
tion. Here^) and k are parameters The various nr nUkTi' 

may be easily seen to be the different terms in the binomial expan!' 
sion ol i ii-f . ... .,' : . -; ' • K , 


1 q \-fc 

f -y) =i« s (i-j)-‘ 



Hence the distribution is called a Negative Binomial Dis- 
tribution. y. 

, , E f' is throwin Q stones at a target, what is the 

probability that his 10™ throw is the 5™ hit, if the probability of hitting 
the target at any trial is 0.40 ? v 

Sol. This is a Negative Binomial Situation. According to 
the above notation a:= 10 , k=5, and #=0.40. 

.*. The required probability 


w (*-i) **«•■'* 

= (4 )(0-40)5(0-60) 5 =0-1004 

4.41. Geometric Distribution. In the negative binomial 
distribution, if k-=l, i.e., the probability of the number of trials 
required to get the first success or the probability that the x th 
trial is the first success is given by , , 

f(x, 6)=pq^ 1 for x=l, 2, 3,... (4.37) 

=0 elsewhere, 0 <#< 1 , q = l—p m 

This probability distribution is called the geometric distribution, 
ibis may be derived independently by considering a Binomiai 
situation where the number of trials is not fixed. Then the pro¬ 
bability that the x th trial results in the first success is given by 
tbe geometric distribution. The various probabilities for £c=l, 2,... 
are the various terms of a geometric progression, and hence, the 
distribution is called a geometric distribution. 

Ex. 4.41.1. In Ex. 4.4.1 what is the probability that the 4 th 
Wempt is the first hit ? 9 

Sol. Here x=4, #=0.40 

a ud therefore the required probability 

=0.40 (0 60) 3 
= 0 0864. 


luat ^ e j= at i ve /kinomial aT) d geometric probabilities may be eva- 
logar’tl usin g tables for factorials, binomial probabilities, 
1 nms etc. If tables are not available recurrence formula will 
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•><?Pj u tJflr.r^iventaX following table. _ 


iationsMps are given in thefollowing 


Name 


1. The Binomial 
distribution 


2. The Poisson 
distribution 


3. The Hyper¬ 
geometric dis¬ 
tribution 


4. The Negative 
Binomial distri¬ 
bution 

5. The Geometric 
distribution 


6. The Discrete 
uniform distri¬ 
bution 


The. Recurrence Relationship 

The Probability function __ 

© 

/<»+]. »)-sxr !(x - e > 

L 1 ' ! » 


for a?=0, 1.2 ,...n 

—0 elsewhere 


X® -X 

e 


/(*» 9 )“ x ! 
for x<=>0, li 2... 

=0 elsewhere 


/(*• 9 )= 


(:)(»-,) 


c: 6 ) 


for x=0y 1> 2,..m 
& 0 elsewhere 


for fc+1... 

= 0 elsewhere 

f(x, Q)=pq^ 1 
for ®=1, 2, 3,... 

=0 elsewhere 


1 


(n—x)(a—x) 
f(*+l» 0 ) = (l;-f 1)(6— n+x+l / { ) 


M. 6 )= /<*+*> e )=OT /(a; ' e> 




Ax, 0)= 

for x=x lt x t ,...x n 
= 0 elsewhere 


[/(* + !, 0)=9 f /(^ 9 ) 


/(*+l, 0)=/(*» 9 ) 


Exercises 


4.18. Obtain the moment generating function for (o) a discrete uni¬ 
form distribution with parameter n, ( b ) a geometric distribution with para¬ 
meter p and identify the distributions with the following M.G.F’s : 

(1) e*(5—(2) (l/2)e*/(l-e</2). 


4.19. If in a small township of 1000 people 40% are conservatives and 
60% are liberals, what is the probability that in a random sample of 100 
people from this township 90% are conservatives and 10% are liberals ? 


. 1 - 4 A wu P rob f. blllt y ^ bat a swimmer will succeed in swimming across 

a lake is 0.4. What is the probability that the 10th swimmer is (a) the first 
one to cross the lake, (6) the 4th one to cross the lake ? 


^‘3^ a ‘ p “2o* pZ&t! %rr. 


4.22. The logarithmic distribution is g iven by 
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f{%, Q) = 


— (I ~p) x 
X log p 


for x = l, 2, 3,... 


= 0 elsewhere and 0<p<l. 
Find E(X) for this distribution. 


4.23. (Random Walk) A point moves along a straight line in jumps of 
one unit each, starting from a given point 0. The point takes a jump to the 
right or to the left with probabilities p and 1— p respectively. Each jump is 
independent of all other jumps. If x is the distance from 0 after N jumps, 
show that 

for x= .. — 2, —1, 0, 1, 2, . where q = i—p. 

/N\ 

Assume [ 1 = 0 if r is not an integer. (For more problem? of this nature 

the reader may refer to, W. Feller, An Introduction to Probability Theory 
and Its Applications, John Wiley and sons. New York, L957p 

4.5. RECTANGULAR OR UNIFORM DISTRIBUTION 


This is a. simple continuous probability^ distribution with 
density function 

f{x, for a<z<(3, a>0 and a<(3, (4.38) 

u GC 

=0 elsewhere. 

In this distribution 9= (a, (3), or there are two parameters ot 
and f3. Fig. 4.2 gives a graphical representation of the distribution 



Fig. 4.2 

Because of the rectangular shape of the 
rectangular distribution. 

^•5.1. Moments. 


distribution it is called a 


co 

/*'r=E(X f )=J affix, d)dx 


- 00 
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e 

f m dx (3 r+1 — a r+1 

= J *' p^a = (( 3 -o:)(r+l) ( 4 . 39 ) 

a 

p or f=s\ t 2,... various raw moments are obtained. 

When a=0, a rectangular distribution with one parameter 
is obtained. For a rectangular distributions the distribution func« 
tion F(x) is given by, 

0 

F(a?)= f f(x)dx 



(4.40) 


If X is distributed according to the rectangular distribution 
given in Section 4.5 then, the probability that X is greater than or 
equal to d where d is a given constant, is given by 


OQ 


J f(x)dx 


d 


P 


f o=§=* 

,1 p-a 3 — a 


d 


(4.41) 


Such a probability statement has great significance in testing 
statistical hypotheses, which will be discussed later. 8 

Ex. 4*5*1 The dial of a spinner is marked 1 to 100 The 

rotated" is b . alm t ced in the sense that the indicator , when 

What is the h‘i v ° U o a number as at any other number. 

Zl^TaZ so l y “ 2 ° Ut ° f 3 MalS the st °P s in 

zero(fee'aJoL d Tn^Wh di vT e ° f the sto PP in « point from 
( also Ex, 3 113) then X has a rectangular distribution, 

/(*)—1/100 for 0<*<100 and/(x)=0 elsewhere, 
at anjtrians bability the indicator stops between 20 and 30 
30 30 

J f( x ) dx—J (1/100) (?a;=(l/ 10 )=^(sa 3 ^). 

20 20 
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..... Since p remains the same for any trial, the required proba- 
bility is given by a Binomial probability law and is 

=( g ) i> a (l-y)‘=*3(l/10f (9/10) =27/1000. 

4 6. THE EXPONENTIAL DISTRIBUTION 
by The density function of an exponential distribution is given 

f(x, 0)=z( 1/0) e~ x ' 9 for x>0, d >0 

=0 elsewhere. 


+i, on ^ parameter. A graphical representation 

ot the distribution is given in Fig. 4*3. 



Fig. 4-3 

The distribution function F(a;) is given by, 


X 


X 


I'(*)=|/(*)i*=f (1/9) e -«'» fo =1 _ 


e~a tie 


CO 


0 


00 

(4 43) 

Therefore, 1-F(®)= f f(x) dx=e-*i°. 

X 

(4-44) 

00 

P{*>d}= f f(x)dx=e-di> 

(4-45) 

J 

d 

This is given by the shaded area in Fig. 4 3 



of daily Mfwmption of milk in a city in execs, 

L ™> MEWtasra 
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probability that the stock is insufficient for both the days if two days 
are selected at random ? 

Sol. Let y denote the consumption on any day 

X=Y -10,000 

has an exponential distribution, 

f(x)=( 1/1000) e*®/ 1000 for 0<x<oo and 0 elsewhere. 

The stock is insufficient if the demand exceeds the stock, 
that is. 

When x >20,000 —10,000= 10,000. 

The probability that the stock is insufficient on any parti¬ 
cular day. 


=P{z> 10,000}=j (1/1000) e~ x 11000 dx=e' 10 . 

10,000 

The required probability=(e- 10 ) 2 =e- 20 (follows from hide- 

pendence). 

4-7. THE GAMMA DISTRIBUTION 

The density function for this probability distribution is 
given as, 

f{x 3 6) = Jc a; a—1 e ~ x /P for z>0, a, (3>0 (4-46) 

= 0 elsewhere. 

1c is such that f(x, 0) is a density function, h may be evaluated 
by using the result that for any density function f(x, d), 


CO 

| f{%, 6) dx=l 




CD 00 

f f (Xt 6) dx= | lc. X “- 1 c— »/P dx 

Jco o 

Make the substitution t=x^ then dt=dx /£ 
00 00 
4 . f #*-' e~ x >Ux=k. (3" | 1 «-* dt 


=k T (a) 
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" . i= TTw ( 4 ' 47 > 

where T (a) is the Gamma function. 

The Gamma distribution is an important distribution. When 
“.T 1 f. ex po n ©ntial distribution is obtained. There are practical 
situations where this distribution is applicable. Some of the 
results are given in the exercises at the pnd of this section. When 
a '—nfi and p —2 a Chi-square distribution is obtained. A graphi- 

^ e P resen ^ a ^i° n of the Gamma distribution for various values 
ol the parameters a and [3, is given in Fig. 44. 



4’71. Moments. 

evaluated 


Fig. 4-4. 

The various raw moments may be easily 


CO 


li r— E(X r )=J x r f(x, 6 ) dx 


co 


00 


1 

~~ (3* r (a) XV ‘ X * 1 e ~ X ‘^ dx 

0 


00 


and 


(3* r (a) i a;r+W 1 e "* /P dXm 

o 

Let us make a substitution t=*/p, then dt=dx t ft 

, p ot+r f 

** r= PM>)J tr+06 " 1 &~ t dt 


o 


P 


= rw v ( a + y ) 


(4-48) 
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I'or various values of r the various raw moments are 

obtained ; 


when r=l, -T (a-fl)— ? p 




when r=2, ^' 2 = ?{£ )' r (*+^ 

= P 2 a(a+X). 

* . 1 . J i i 

Var fit — 

=P*«(«+1)—«*P*=«P* 


0t(«+I) a. T («) 

IT*) 


(4-49) 


(4-50) 


• • 


(4.51) 


The distribution function F(») for a Gamma distribution is 
given by 


X 

F(x)= | f(x, $)dx 

— CO 

oo 

= 1 - ( —--- a :” 1 - 1 

J p« T(a) 


GO 

=1 - f T& r * dt (*-62) 

The integral is very difficult to evaluate for a general a. 

CO 

The integral J f^y dt is tabulated for various values 

u 

of u and a. These tables are called the incomplete Gamma tables. 
A reference is given at the end of this chapter. By using an 
incomplete Gamma table we can evaluate the tail areas 

00 

(“• f m *) 

U 

or the distribution function 



t * _1 e-* fit 


of a Gamma distribution. 
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CD 


M x (0=Ee* = r e „ 1 

l p»r<«) * e 


0 


'•/* dx 


00 


Let 


M,(()= 


P°T(«)| a;a 1 e 

U ~ X (]T~*) tlien du=dx(l.-t V 


= (4.53) 




2 I 


(W 


+ «(«+i)(a + 2) 


m* 

S~! 


. 4*grsr * rr?^ 1 , * 

WhaulT dy a \ a 0amma ^ribitionwZ fl 2 g T-2 

^•ss^tssrAStia^-^ 

tributron; Le ‘ ic=inorease in sales on any day. X has the dis- 


where 
That ig # 


f(x)={lf r(a))^- 1 c-^/P 

a=2, p = 2. 

/(a;)=(l/4)u; e -x / 2 for «>0 
=0 elsewhere. 
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If the increase in sales tax is to exceed Rs. 100 the increase 
in sales is to exceed Rs. 2000. Hence the required probability i Sj 


=P{z;>2000} = J (1/4 )xe-*l* dx 

2000 




\ 


,00 

| V e' v dy (by putting xJ2=y) 

1000 


CO 


CO 




1000 


] 

1000 


(by integrating by parts) 


AdOOOe ‘ 1000 -fe' 1000 = (1001 )e' 100 °. 

\ t,; 

Exercises 

4.24. Find the moment generating function for (a) a rectangular 
distribution with parameters « and 3, (b) an exponential distribution with 
parameter 0. 

4*25. Determine the distributions, if possible, from the following 

moment generating functions : 

* '• 

(a) M x( *)=(l— 2t)'\ (6) M x (t)={l-t)~\ 

(c) M x {t)~e 2t + ta ' (d) M K (t)=e 2tS - 

(See the Normal distribution also) 

4.26. Integrating by parts or otherwise, show that 

' _ ' J r(a) = '(a — 1) . T(a—1) for. a>0, 

** , ' % 

• 

<4.27. Show that r(l/2)==yrc. - ■' v 

CO 00 

£ Hint. [r(l/2)p=J j x~ 112 y- l > 2 e~( x +y)l 2 dxdy. Change to polar 

0 0 - co-ordinates] 

4.28. The Beta distribution is defined by the density function 

f(x)=*Jc x a * (1 — x)$ * for 0<!K<1, a>0, (3>0 

=0|elsewhere. 

where *-l/B («, Pj-rfe+pj/Tfo) .T(/3). 

variate,^^ B < X )' < 6 > Var <*>, (o) P<*> 0.5, 


when X is a Beta 
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4.29. The Pearson system of curves are given by- the differential 

equation 

Show that (1) f(x) is the normal density function when 6=c^=0 and 

a> 0 . 

(2) f(x) is a Gamma density function whea~a=c=0, &>0 and d^—b. 

(3) f[X) is a Beta density function when a=0, 6 = —c and c£>L — b. 

4.30. Evaluate the first two cumulants and for a Gamma variate 
and show that /c 1 = E(X) and fc 2 =Var (X). Also obtain k z . 

4.31. If X is a Gamma variable with parameters a and (3, find the 
probability that a; >4 when (1) oc = l, (3=2, (2) a=2 , (3=4. 

4.32. Suppose that during the rainy season on a tropical island the 
length of a shower has exponential distribution with the parameter 0 =2, 
time being measured in minutes. What is the probability that a shower will 
last for more than 3 minutes ? If a shower has already lasted for 2 minutes, 
what is the probability that it will last for at least one minute more ? 

4.33. The sales tax returns of a salesman is exponentially distributed 
with the parameter 0=4. What is the probability that his sale will exceed 
Rs. 10,000, assuming that sales tax is levied at the rate of 5% on the sales ? 

4.34. The annual sales ; of wheat in millions of bushels by a wheat 
board is assumed to be approximately distributed as a gamma distribution 
with parameters <x«=3, and (3 = 2. If this wheat board has 20 million bushels 
of wheat in a particular year, what is the probability that it won’t be able 
to meet the demand ? 

4.8. THE NORMAL DISTRIBUTION 

This is the most important distribution in present day 
statistical analysis. This distribution is known by several names 
such as, Gaussian distribution, error curve etc. The name ‘Normal 
distribution’ is rather unfortunate. This does not in any way 
mean that other distributions are abnormal. Data arising from a 
goo many practical situations are seen to be approximately 
normally distributed. A number of distributions can be approxi¬ 
mated to a normal distribution. In many cases a simple transfor¬ 
mation will transform a non normal distribution to a normal 
distributmn Due to many such theoretical and practical reasons 
tins distribution plays a vital role in statistical analysis. The 
density function for a normal distribution is given as, 

f(x, 0) = & e ( x “) /2(3 2 —oo<a;<oo, — oo<a<oo, 

P>° (4.54) 

where a and (3 are parameters and k is a constant which can be 
evaluated, since 


f f(x, 6)dx= 1 


- 03 
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(hen 


(hen 


Evaluation of k: 


09 


00 


| f(x, 0)dx=;k j* e f 1 '/ 2 ). ^ dx= 


— 00 


- 00 


t> - a 

Put 


dx=*$dy and —po<y<pp 

,{* f* » . . _• ’ ' 

oo 

1=& | . |3 . e~ v *l 2 dy 

- 00 

♦But «-^ 2 ' is an even function of y and hence 


00 


l=k . (3.2 | e y *l 2 dy. 


0 


Put i=y 2 J2 dt=y . dy and y^fit) 1 / 2 


GO 


*• 

l=k . J3.2 {2t)~ 1 l 2 e- 1 dt 


00 


= k . P . yf2 j tm-Ue-W 


0 


-Jfe . p . S2T(l/2) 


00 


JjSince 


I» = J **- 1 


e~ x dx and 


0 




Note. Odd and even functions. 

If ^ (a?) = ij*(— x) then <j< (x) is called an even function of a, 
a a 


J <j *{x) dx=2 J <p(x) dx 


' —a 0 

If 0 (x)=—0( — x) then 0(«) is called 
a 


an odd function of x, 



—a 


dx=0 


I 
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• • 


k= 




(4.55) 


/(x ‘ e)= ^kr e <I/2) ^ p -° 0< 


X<oo 


4*81. Moments* 




dx 


CO 


X —Ot 


Put y ~—p—, dx=$ dy and — oo<y<oo 


CO 


o 

• i 


i r, , „ . -m v’ 

roJ (a+p s) e * 


- CO 


CO 


GO 


—( 1 / 2 ) 2 / 


=-“=[ * l)V dy-f-^L [ye ' dy 
fW27rJ BV2 tt: J 


CO 


ao 


00 


CO 


-aw 


But J fi 1 )y2 dy=2^ e ' 1 dy 


l> 


(4-56) 


—(1/2)^ 


— 00 


is an even function] 


oo 


and 




-<l/2) y> 

e dy —0 


— 00 


r -(1/2) y' 

[ye 

is an odd function] 


00 


• a f 

• ^TsT 2 ] 


— ( 1 / 2 ) 2 /* 


dy 
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4 2tv 


-(i/2) r 


i 

dy is evaluated in section 4'8 ^ 


(4-57) 


^ 2 = Var. (X)=E[X—E(X)] 2 =E[X—a ] 2 

oo f x—a .V 

= 1 f (._«). e~ m M fc 

2tc J 



o 2 =-t=2 v / 2 f y e~*dt 

V2tc J 

o 

=Ji 2r ( 3/2) 2 i r (1/2, 

[Since T (a)= 
(a-l)r (a—1) and T (1/2) =4*] 
= P 2 (4'58) 

F °' fche normal distribution the parameters a and 6 are such 
tnat a is E(X) and (3 is the standard deviation of X. Because of 

density*functi * normal distribution is usually given by the 

1 -(I 19) ( X ~A Z 

f{%> Q)= - e V. ° ' , —oo<3<oo (4 59) 

a * 2n ~oo< (jl< oo,g>0 
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w here n and a are the parameters, This representation reminds 
the readers that the parameters are E(X) and standard deviation 
of X when X is a normal variate. There are several notations 
for a normal probability distribution. Some of them are as 
follows. 

(1) N(^, <t) (Normal distribution with mean /a and S.D. o) 

(2) N(/r, o 2 ) (Normal distribution with mean u, and vari¬ 

ance a 2 ) 

(3) X: N (/*, a) (The stochastic variable X follows a Normal 

distribution with mean and S.D. a) 

For example 

X : N(0, <j) (The s.v. X has a Normal distribution with 

mean 0 and S.D. a) 


%.e. f{x,6)—-^=r- e ^ , — oo<x<oo, a>0 (4 60) 


X . N (0, 1) (The a.v. X has a normal distribution with. 

mean 0 and S.D. unity) 


t.e. 


/<*)= 


X' 


4 


— 00 <x <00 


(4-61): 


f(x) hCTe'k+b^r °f Ue r d t ^ e standard normal distribution since 
J( ) e is the density function of a standardized normal variate. 

In this book we will use the motion N(/x, cr). 

and convenience weTm fin^o” the"® FUnCti ° n - ^ ““P™* 
M.G.F. for Y=X—/x 

M y (0=M x _ fJ .(«) ==e -<P m x (0 

M x (i) = e <f‘ M x _^.(f) 


or 


M 


^ 


- CO 


Put 


%—[i=y then —oo <y<oo 


CO 


V 27i a J 


*y-( i/2) 


dy 


- CD 
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= i r 

V2tc g J 


-(V2) a* [y*-2<T*ty] 


dy 


— ao 


To complete the square in the exponent we add and sub¬ 


tract 


2<j a 


[° 4 < 2 ] 


oo 




J 


1 r / Z tT S 

-2^rt3/ 2 - 2 ° 2 ^ + ® 4ja J+ 


dy 


CO 


J f a a _ 1 r 

T ^ -Si 


V27T 


j 


dy 


- 00 


Put 


U — 


y — a 2 t 


du=^~ and — oo< u <oo 

(7 


oo 


W 3 


e * 2 o*/2 r 

Mx - (i) =W J e 2 




CO 


=e « 2 a*/2 


(4-63) 


oo 


^Since j e w / 2 du=<\[ 2n, see exercise 4 - 


27 


00 


at the end of this section and section 4.8) 
The M.G.F. M x (<) = e ^M x _ (t) 


t*a 2 


^ (JL 4- 

—e ^ 2 


M X _ |1 (<) = e^ a /2 

M^y 0=e ^ 2 


(4*64) 

(4*65) 

(4*66) 


<2or. 1. 
€or. 2. 
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t i.e. The M.G.F. of the standardized normal variate iB 

■e* ft . The various raw moments may be obtained from the 
M.G.F. by the formula 


d r 

dP 




4.83. Graphical Representations. Graphical representa¬ 
tions of the distributions 

(1) X : Nf/*, ff ) (2) X : N (0, o) (3) X : N(0, 1) 

•are given in Fig. 4.5. 



Fig. 4*5. 


1 


J(x, 6)=. 

(TV2W 
Maximum ordinate 
1 


-<-?)■ 


ayf 2 it 


1 __ 
f[x, 9)= — e 2o s 
a.y27t 

Maximum ordinate 
1 


ay 2 tc 


Symmetric about the ordinate Points of inflection 
*1 Points of inflection at a; =±(T . Symmetric 

at *= h- ±a. about the f{x, 0) 

axis. 


a; 


= ~~ e ' 

Y2jr 

Maximum ordinate 
1 

e y2^r ’ 

Pornts of inflection at 

Vi*i* symmetric about 
the /(*) axis. 


Let us examine some probability 
following problems : 


statements. 


Consider the 



where X : Nfa, ff ) 
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Sol. (a) Pl 1 -^ <1 ]=P{ I »-/* I <“} 


= P{ —a<£ — 

= P {p—a^x^fi + c} 

|X+<T (flJ— ti ) 2 


- 


1 


a-\/2v: 

[L — C 

=1-6826 (See the comments) 
This is given by the shaded area in Fig. 4-6 (a) 

(b) P |j— ^ j<2 j =P{ | x- n | <2a} 

= P{—2a<a:—/^^2 cj} 

= P{/x — 2a ^+2 a} 


H + 2a 


= | i 

J 'y/'ZnG 


_(a -E) a 

2a 2 , 

e ax 


(4-67) 


\l— 2a 

=0-9544 (4-68) 

This probability is given by the shaded area in Fig. 4-6 ( b). 

(c) P{ | x — [x | ^ 3}=P{/x —3cr} 

A*< H-3ct (x~iL, 2 

dx 


-I 


-=- e 


jji—3ct 
=0-9974 


a \/ 2n 


(4-69) 



m SBZ ‘* m m Happ - x 


H ^ // // 7 , 7 ,- -- -» uuMij^LLLLLL 

M M+<r A-2c~ M M+26- M- 36 - ji 



A+36" 


(a) 


( 6 ) 

Fig. 4.6 


( c ) 


TMs WwniZ 7 l y "®“* * r ™ 1 


Prababilities that 


x — ix 


j a V > ^ jr wui.1 jg UI ' AlVXiuw 

topic will be discussed in the next section 
^1. 2, 3 may be obtained by subtracting 
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the areas in Fig. 4*5 (a), (6), (c), from unity respectively. 


or 


P 


fF? 





} 


where d is any given constant. 
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The normal distribution X : N(0, a) and X : N(0, 1) may be 
studied as special cases of X : N(ft, a). For example N(0, a) is 
N(/i, a) when /*=0 etc. 

4.84. Normal Probability Tables. If we are interested 
in the probability that a normal variable X, distributed as 
N(/i. °’)> takes on values greater than or equal to a given quantity 
c, then it may be evaluated as follows 


Put 




=y then 


when 


x=c, y— 


^and^i 


<y< co 


co 


P{*>c}=f — e y ~^dy 
J \/2n 


where 


t= 


C — (X 


The values of this integral 


00 


(4-70) 


' (I e y ‘ l2dy ) 

~ are caiied the 

t 

I ~7?r e ~~ yi ^dy is given. 


But 


V 

! k f i ... 


71) 
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This follows from the symmetry of the distribution. 
By using normal probability tables the tail areas 


oo 


f _JL e-y’ l2 dy 
J v'271 


may be easily determined. A normal probability table is given at 
the end of this book. 

Ex. 4.8.2. If X:N{p=10, a=2) find the probability that 
x lies between —3 and 12. 


12 


Sol. P f -3<*<12}= j 2 = . 


-( 1 / 2 ) 


(X — 10) 2 


4 dx 


-3 

12-10 


= I 7 ^ e vV2dy 


-3-10 


( B y putting y= j 


= 1 vk e v ' l2(, y 


— 0*5 
0 


= f !— e V s l 2 dy+ ( -L e~ y ‘l 2 dy 

J V‘2n 


— ti’5 
6-5 


= 0 hk e *‘ / **+f W„ e ~ y ' l2iy 

' u 0 

0 

Q 1 , 6 

V2n e I* —== e~ yi !~dy, from symmetry^ 

—6*5 J J 


= 0*5-p 0*3413=0*8413 

(obtained from normal probability tables) 
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Ex. 4.8.3. The marks for a particular subject, obtained by 
the students of a university in an examination, is assumed to be 
approximately normally distributed with mean 70 and standard 
deviation 5. A student taking this subject is chosen at random . 
v What is the probability that his marks is over 80 ? 

Sol. Let X be the stochastic variable denoting the marks of 
any student in the set of students under consideration, then 

X : N(|a = 70, o=5). 

The required probability 


=1 


1 

y/2n.5 


e 2o 


2 



80-70 

6 


1 

\/2u 




2 


= 0-5- 


J 


0 


_1_ 



— 0-4772—0*0228 [(from normal tables) 

Ex. 4-8-4. Frcm the following moment generating functions 
determine the corresponding probability distributions. 


[a) M x (t)=e 2t + 2t \ (b) M x {t)=e 8t3 


So1 * («) Tiie M.G.F. for a Normal 
meters p and a is given by 


distribution with para- 


M x (0-e^+^ 2 / 2 

From the uniqueness property of M G.F. 

< s => „ =2ancU = 2 

Tlle corresponding probability distribution is 
o) where n=2 and o=2. 

(6) M x (<) = e 8! “ 


For a N(/i, a), M x (t) =e t ' L + ti( * 2 l 2 


Comparing the expressions e 8 * 1 and e'M«V/2 and usino th. 

k: rai ? =y- the °° rrespondi ^ 


is 
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The moment generating functions of some of the most com. 

monly used univariate distributions, are given in the following 
table. 


Distribution 


1. Binomial 


2. Poisson 


Probability function f(x, 6) 


(?) 


■k x for x=0, 1,...N 
and 0 elsewhere 


p x q 


— e for x=0, l, 2,,.. 
x ! 

and zero elsewhere 


\M.0.F, ; M 


X^) 


(<Z+^N 




3. Normal 

C 

4. Exponential 

5. Gamma 


6. Chi-square 


1 _ <*-*)' 

•—v e 2 a 2 for — co <#< oo 

a V 2k 

e for .r>0 

and zero elsewhere 

* « a — * e~~ x ^ for #>0 

P a Ha) 

and zero elsewhere 

. m-i ,-*/ 2 


2 k l 2 r(lcj2) 


x 


for x > 0 


^ {j. -f-^ 2 cr 2 /2 


( 1 - 0 /)- 


and zero elsewhere 


(1—P0 


—a 


( 1 - 2 /) 


—Jfc/2 


Exercises 

4*35. Let Y=log X. If Y has a normal distribution X is said to have 
a log-normal distribution and the density function for a log-normal distribu¬ 
tion is given by, 


-(log x — log a)*/2(3 2 

f(x, 9)=-— : -for x>0, a>0, (3>0, 

p.V 2n ,x 

=0 elsewhere, where a and (3 are parameters. Find E(X) 
and Yar. (X), if they exist. 


4*36. A Pareto population is given by the density function, 
f(x, 0)=p a' p /x' 1> + 1 for x>«>0 and p>0 
=0 elsewhere. 


Obtain E(X) and Var. (X), if they exist. 

4*37. For "a normal distribution N(p, a) show that p 2n .|-i=0 for 
n— 0, 1,... and V-Jo*= 3 where [t r denotos the rth central moment. 
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, i 4 ' 3 ?’ „ For a normal distribution with the parameters a.=10 a 
calculate the following probabilities. 


= 2 


9 k /, (o L the P r obabihty that for the normal variate X, x is greater than 
(b) the probability that a; lies between -5 and 5, (c) the probability 
that | a? J < 15, (dj the probability that | x j >2, («) the probability that 
# is less than 3. 


439. 

such that 


For a normal distribution with parameters p, and a=l find t 


(a) P{ —— |x^} = 0*95 

(b) P{a?-K^<«+^}-0-99 

4 40. For a normal distribution with the parameters ^ and a evaluate 
t such that 


F {-«-==?<•} 

Also obtain tiro values t 0 and ^ such that 


that P |-« 0 < f ] ) = 0-95. 

Are t 0 and unique ? 

4.41. Suppose that the heights of Canadian Citizens of a certain age 
group at a particular time, are assumed to be approximately normally distri¬ 
buted with the parameters ^=66" and ct= 2". What is the probability of 
getting a person in this age group whose height is as large as 70". 4bove 
what height can we find the tallest 1% of this set of people. 

4.42. If X is a N( ji, a) then for any given a we can evaluate a t such 

that p j -K 

= P{#—— oc. 

that i s jo. is said to lie in the interval x-ta to x+ta with a probability 1 - a. 
ll.ecl.ameter of bullets produced in a factory is a normal variate with a 
Fuandard deviation s = 0-01 units. A bullet taken at random from this 
tactory ! 8 found to have a diameter 2 units. Obtain an interval estimate o. 
** ol this normal population, with a probability of 0*95. 

Ttl . . 4 ? : in dicator moves from a particular point to either 

The doviation from tins point is a normal vaviabie with parameters ^0 m d 

(2T th;t h 1 18 - t 'r e P ro V 1 ^ llt u y (1) of getting a deviation as large as v 2 
(2) that any deviation will lie between —0 01 and 0 1? 


444 


For a normal distribution N c) evaluate P{ I x 


Ohtnin l.^i* f 4.1 ■ «» iiuuwun p., a) evaluate P{ . T -u I >9„\ 

Pr0babllity by Using Chebyshcv’s ino^ity UfoZt 

Under uf e \„? d T t ^r^,en b NT i “ 1 „^ Stribl,ti0n w,th P^^rotere N end p. 

iir 
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4 9 . CHANGE OF VARIABLE 

qnmptimes in statistical analysis we will be interested in the 
Sometimes m of a g i ve n stochastic variable. For 

distribution of ‘ ariate what is the distribution of X* t 

example, if X is a normal v» ^ ^ distribution of 2X + 31 etc. 

Sometimes wo'will need the function <£(X) such that Y = <£(X) has 
Sometimes we w , v • lV glV en stochastic variable. 

In D or “er to'lnswer these problems, we will discuss the theory of 
change of variables in this section. 

Theorem 4*1. If w differentiable and either a 

monotonic increasing or decreasing function of x, then the density 
function Mv) for Y is given by 


i dx 

Mv)—M x ) 




(472) 


where f 2 (a) is the density function for the stochastic variable X. 
We will prove the theorem when <f>(x) is an increasing function of 
* and the proof when is a decreasing function of a* is left to 
the reader. Let y = <f>(x) be a monotonic increasing function of a 
and let the curve y = <Rx) be as shown m Fig. 47. From Fig. 
47 it is seen that corresponding to x=v we have = w or corres¬ 
ponding to y=u we have x = v. Let us take the distribution 
function for Y. By definition the distribution function F(y) of 

Y is 


/I 

i 

t 



0\ ' V 

Fig. 4-7. 




— oo 
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B ut from Fig. 4*7 it is seen that 

P {<f>(x)^u}=Y {ck^v}= j* f 2 (x) dic=Fx(t>) say 


(4.74) 


These results hold good for any two corresponding values of 
u and v. 

Fy(w) = Fx( , y) for any corresponding u and v. (4.75) 


•4 r p*w-4F*(. ) £ 


(4.76) 


, tFe distribution function of ao. Z then the 

density function 

Equation (4.76) yields, ( 4#77 ) 

f\{y)=Ux) - ( 8 ^ nce u a nd v are any corres- 
2 dy ponding values of y and x 
respectively) 

When y—<f>(x) is a decreasing function of x, is negative 
and hence the general result may be written as 


My)=h(x) 

dy 

Ex. 4 9*1. If X has the density function 
f(x, 8) = y for 0<x<6, Q>0 
= 0 elsewhere 

Find the density function for Y=2X-{-3. 

y=2x-\-3 


(4.78) 


_o ^ 
dx 


h{y)=h (*) 




Uy)=~2f for 3<2/<20-f3. 


= 0 elsewhere. 
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Ex 
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. 4-9-2. Given ,/(*)=* e-**l'2 for *> 

= 0 elsewhere 

in the density function of (a) T=X 2 , ( h) Y=log e X. 


Obtain 

Sol. y=x 2 


dx 


^-=2x=2y 11 * — ^ 1/2 

ax 


/l(2/) =/2(*) 


dx 


dy 


dy 

=y 112 e ~ v/2 (ir 1/: ) 

= \e~y' 2 for 0<y<oo 
= 0 elsewhere 


(V 


y=log e x 

^2/ _ JL ^ dx = x=e v 

dx x dy 


My)=h (*) 


dx 


dy 


*y 


_ —( 1 / 2)6 

8^* C 

But y=loge.x => e v =x. 

When x->0, y-+—oo and when x-+oo, y-+oo. 

/i(2/)=« 22/_(1/2)C » -oo<y<oo. 


Exercises 

4.46. If X is a standardized normal variable show that X® is a 
Gamma variable with parameters a= 1/2 and (3=2. 

4.47. If X is a beta variable with parameters <x=ml2, fi=nl2 show 
that Y-wX/m(l—X) has an F-distribution with parameters m and n. (The 
F-distribution is given in section 4*12). 

4.48. If X is a standardized normal variate obtain the distribution 
of | X | . 

4.49. If f(x) is the density function of a stochastic variable X, show 
x 

that */=. | f( X )dx has a rectangular distribution. The change of variable 
— 00 

here is often called the probability integral transformation. 
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4.50. Given that X has a density function 

f(x) m 2xJS for 0 < as < 1 

— (3—«?)/3 for l<a<3 
= 0 elsewhere. 

Obtain the distribution of Y = X 2 +1, 

• H 4 'x\wli e 7r n8form) ' , lf Mx(,) is the 

«? x“ e giv^ by 8 °“ 6 °° udiliiona ' «>• probability diatribution 

00 


/( * )= 5r j “xi*)®' 11 *- 


CO 


Obtain the density functions from the following M.G.F’s by using the above 
equation (a) M x (<)«e* I*, {b) (1 (c) (d) e 2(e*_l). 

stochastic^variable^X^ under^somi p- 1 18 f ha racteristic function of a 

of Xis given by, ? 6 ® oriera ^ conditions, the probability function 


oo 


t {x)=(l/2n) j 


ixt 


<f> x {t)dt 


- CO 


where *+( l) 1 / 2 and the characteristic function * x {t) of X is <M<)=Ee«x 
(E denotes mathematical expectation). Given the characterise function 

(а) (2e*/3+l/3)«, 

(б) pe lt l(l-q 6 it) where 0<p<l an d q = 1- Pi 
(c) [t itb — e it9 )jit(b — a), (d) e 2t '*-* 2 > 

formula.* 116 COrreS P° ndi ^ probability distributions by using the above 

153.^ Show that a Binomial probability function can be written as 
M 0)= ( * / a:/(1 + 0 )” for * =(U . n > 0<0< co, n—a positive integer. 

probabitifw A ^ 8creto s - x - is said to have a power series distribution if the 
P °bab,hty function is given as /<*, 0)-^(^(Q) where «(•) is nonmegat've 

n °f x,g{Q)= 2 a(x)Q x and x=0, 1, 2.... By assuming special funo- 

X = (J 

n ^ 0r obtain the following distributions, 

< 5 ) Logarithm' 8011 ’ ^ Bm ° mia1 ' (3) Negativ0 Binomial. (4) Geometric, 
’ °S ar ithmic series distribution. 

[Hint: IfrtO)».e then 


CO 


2 a{x)W =e Q a ^ Q x /x ! 
* =0 x=0 
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We get the Poisson distribution. If the probability function of a s.ti. x i a 

given by f(x, G)=k . d x /x, for x=l, 2, . then X is said to have a Logarithmic 

Lies distribution, where k is an appropriate constant. mie 


4-55. If the probability function of a s.v. X is given by 

fix, 6 ) = b{x)e® x lh(0) where xEP> (real line). 



if X is continuous and is 2 b(x)e jX if X is discrete, then X is said to have a 
general exponential type distribution. For the general expinenfcial type, show 

that |r(0)=E(X)=/i , (0)/A(0) where h'(Q)=~^~ h(0). Also obtain as special cases 

Cow 

the (1) Power series (thereby all the distributions in 4.54), (2) Exponential, 
(3) Normal with known variance, (4) Gamma with a known distributions. 
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CHAPTER 5 


multivariate distributions 

5,0. Introduction. In chapters 3 and 4 we discussed p t 
Lability distributions involving only one stochastic variable o' 
univariate distributions. In this chapter we will discuss probabl^ 
lity distributions involving more than one stochastic variable or 
multivariate probability distributions. If the stochastic variables 
are discrete we will call their joint probability distribution as a 
multivariate discrete distribution and if the stochastic variables 
are continuous thentbeir joint distribution is called a multivariate 
continuous distribution. If Xi and X 2 are two stochastic variables 
then their joint probability distribution is called a bivariate dis¬ 
tribution. Similarly the joint probability distribution of fe 
stochastic variables Xi, X 2 ,-.- X fc is called a fc-variate distribution. 

In this chapter also we will follow similar notations used in 
the previous chapters. The joint probability function of k 
stochastic variables will be denoted by f(x i, x^^.x-k, 6) where Q 
stands for all the parameters in the distribution. The probability 
function for one stochastic variable X* will be denoted by f(x i, $), 
the joint probability function of two variables Xi and X 2 will be 
denoted byf(x 1} x 2 , 6) etc. If there is no parameter in a distribu¬ 
tion, 6 will be absent in our notation. 

5.1. A BIVARIATE DISTRIBUTION 

In order to introduce the concepts of joint probability distri- 
bution, the marginal probability distribution or marginal distribu¬ 
tion and conditional distribution, we will consider an example 
of a bivariate discrete distribution. Let us consider a simple 
experiment of throwing an unbiased coin twice. The outcome set, 

S={(H, H), (H, T),(T, H), (T, T)} 

where H and T denote 'head’ and ‘tail’ respectively. These out¬ 
comes may be assigned probability 1/1 each. 

i.e., P{(H,H)} = l/4 

P{(T, H)}=l/4 
P{(H,T)}=l/4 
P{(T, T)}=l/4. 
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let ns consider the stochastic variables X and Y w her„ v 
’heads and Y=number of tails t.w ~ « wnere X 


y 


Let us con»iuei. tu C vananies X and Y y 

-number of heads and Y=number of tails then 1 5 , 

; 0j 1, 2 and the distributions of X and Y are ’ 2 and 

X : /(a;)=l/4 for a;=0 
— 2/4 for z=l 
=1/4 for x=2 
=0 elsewhere 

Y • 9{V) = 1/4 for y=0 
=2/4 for 2 /=l 
= 1/4 for 2/=2 
= 0 elsewhere. 

5.11. Joint Distribution. For th P ovomni, ■ - , 

evaluate the joint probability function bv form/ 11 51 . We Wl11 
table as given below, ^ forming a two-way 


Table 5.1 


V 

\ 

__ y\ 

o 1 2 


0 

o 

o 

1/4 

i 

0 1/2 0 

1/2 

2 

o 

o 

rH 

1/4 

__ 

V 4 1/2 1/4 



/<*. jz/then Pr0babUit y *“>«*» of X and Y is denoted by 


and 


f( x >y)= 1/4 for x=2, ?/sa0 
=] /2 for .r=l, 2 /=l 
= 1/4 for ar=0, y== 2 
= 0 elsewhere. 

Therefore the joint distribution mav be given as 
/(0. 2)=1/4 
/(X, i)=l/2 
/( 2 , 0 ) = I /4 
f( x > 2/)=0 elsewhere. 


intboduction to statistical mathemat Ics 

Cl? Marginal Distribution. In table 5 1 the prob _.. 

in the margins, i.e., £ f(%, V) 911 f( x > V) S ive the probabi' 

1 x y 

Hfv distributions of the stochastic variables Y and X respectively 
where Z and E denote summatl0ns Wlth res P ect to * and y 
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lities 


x 


respectively. Therefore the marginal distribution may be defined 
at follows/ If f(x, V) is * he J' 0int Probability function of the 
stochastic variables X and Y then, 

f(x)=£ f(x, y ), if X and Y are discrete 

y 

= j 'f{x,y)dy t if X and Y are continuous 

y (5.1) 

is called the marginal distribution of X. This is the distribution 
of X alone. Similarly the marginal distribution of Y is given as 


9 (y)=Z f( x >y)> if x and Y are discrete 


x 


(5.2) 


= I f(%> y) dx> if X and Y are continuous 


i 

X 

Here f(x) and g(y) need not have the same functional form. 


This definition of marginal distribution may be generalized. 
II x %•••*>•£&) denotes the joint probability function of the sto« 

chastic variables Xi, X 2 ,...,X k then the various marginal distribu¬ 
tions are given below 


f(xi) £ £...£ f(xi, X 2 ,...X ]if Xi, X 2 ,..., X/ c are discrete 
*2 *3 x k (53) 


={ | - [ m * x 2> • ■ • > x lc)dx2)» •. ,dXj c 


X 2 X , X k 


if Xj, X 2 ,..., X, are continuous 


f(x i. # 2 ) — £•••£ f(xi,,.,.xi e ), if Xi, X 2 ,...X 7c ) are discrete 

X 3 X k 


—••• x 2>• • •)%]() dxz, •. . t dx ki 


X 


3 Xjc 


etc. 


if are continuous 

(5*4) 
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where/(*i, *,) denotes the joint probability function of X, ami v 
Analogous to the axioms for a probability function in a univariate 
case we will formulate the axioms for a probability • 

multivariate case as follows. A function Z, ? ? a 

the following conditions is a probability function. satls fying 

1. /(a- 1 ,...,a: fc )>0 for all x it i=l , 2 ,..., k 
[i.e. for all x it -oo<a;, <oo, i=d, 2 ,..., k) 


00 OO 


2. j ... | f(x .. Xk ) d Xl ...dx h = 1, if Xn. X k are 


— 00 — 00 


2 

— 00 < x x < 00 


continuous, 

^ %/c) = l 

- 00 < x k< °° 

J fXi,,.., X* are discrete. 
Jn this case if A is an event A = (( 

Q<%k< a k} then the probability of i* 1 ”"’ f °<*i«h.... 
P(A) is given by 7 1 the occ urrence of A, that is, 

a 1c 

P ( A )= ••• X k )dx!...dx foifXi X *■ 

J 0 J A a re continuous 


2 


Ex. 5-12-1. Given that 

/(*. y)= 1 


0<x^a f( Xl ""> X lc) 

if Xi are discrete. 
^-•"/w*>0,j,>0 and*, « 

constant. 


0 elsewhere, 


is a density function. Obtain the marginal 

individual density functions of X and Y. 3 stn buhons or the 

So1 - f( x > V) is a density function and, therefore, 


00 CO 


I J/(2. y) dx dy=l. 


- CD - 00 


00 00 


That 


is j j 


00 


00 


1c e~ x dx 


o o 

Hence &(1/2)=1 0 r k=2. 


dy—k | q-x dx fe~ 2 y dy=\ 
0 0 



2e~ x ' 2v for x>0 and y>0 
elsewhere. 
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Therefore, /(#* V) ~~ j o 

The marginal distribution of X is, 

oo 

j(x)= f 2e'*' 2y dy=e~ x J 2e' 2 v dy=e * 

• /"k 


u 


Therefore, 


/(*)=[ 


e - ® for a>0 
0 elsewhere 


The marginal distribution of Y, that is, gr(y) is, given as, 

__ co 

CO 

g(y)- I 2 g"*“ 2 * dx = 2 e“ 2y j e - * d:r = 2 e -21 -/ 

o 0 

( 2 e~ ly for y>0 

That is, 9(y) — ^ q elsewhere. 

Ex. 5.12.2. The joint distribution of X and Y is given below i 

/(O, 1) = 1/27 /(0, 2) =5/27 /(0, 3) =6/27 

/(1, l) = 2/27 /(1> 2) = 4/27 /(l, 3) =4/27 

/(2, l)«l/27 /(2,2)=2/27 /(2, 3)=2/27 

and /(x, ?/) is zero elsewhere. 

Obtain the marginal distributions of X and Y. 

Sol. Here z=0, 1, 2 and ?/ = l, 2, 3. 

f(x)=Zf(x;y) 

y 

= 1/27 + 5/27+6/27 = 12/27 when z=0 
=2/27 + 4/27+4/27 = 10/27 when z=l 
= 1/27+2/27 + 2/27=5/27 when *=2. 

/(*) = 12/27 for z=0 
= 10/27 for a;=1 
= 5/27 for a;=2 
=0 elsewhere. 

9{y)=Yf(x, y) 

X 

=1/27+2/27 +1/27= 4/27 for ?/=l 
=5/27+4/27+2/27=11/27 for y=2 
=6/27+4/27+2/27=12/27 for y=3 
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g(y)= 4/27 for y=l 

= 11/27 for t/=2 

= f 12/27 for y = 3 
(0 elsewhere. 

A graphical representation of the joint distribution is given 
in Fig. 51. 



Comments. The marginal probability function of X is the 
probability distribution along the #-axis in the sense that if the 
total mass of one unit is to be distributed along the £-axis the 
masses at x=0, 1 and 2 give the distribution of X. Here corres¬ 
ponding to x=0 we have the masses 1/27, 5/27, and 6/27 [that is 
/(0, l),/(0, 2) and/(0, 3)]. Hence the total mass at a=0 will be 
1/27+5/27-1-6/27=12/27. Similarly the masses at x=l will be 
10/27 and at a; = 2 will be 5/27 and thus the total is unity. If the 
total mass of unity is distributed along the y-axis we get the mar¬ 
ginal probability function of Y. If we want the distribution of X 
along the line y = 2, that is, if a mass of one unit is to be distri¬ 
buted along the line y =2 in the proportion of the probabilities 
along y =2 this is given by the conditional distribution of X given 
y=2. This will be discussed in the next section. 


5*13. Conditional Distributions. If/(a, y) is the joint 
probability function of two stochastic variables X and Y then the 
conditional distribution of X given that y—a, is defined as, 


f(x | y=a)= 


f{x> y) 

v{y) 


y=a, 


= M y) 

’ 2f{x, y) 


y=a 


g(a)^0, when X and Y are discrete. 


for 
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_ f(x. y) 

J f{v> y) dx y=a 


for g(a)= J/(ar, y)dz\^_^Q 


when X and Y are continuous. ( 5 . 5 ) 

The conditional distribution of X given Y is the ratio of the 
joint distribution of X and Y to the marginal distribution of Y, 
evaluated at the given value of y. Similarly 

9(y | x=b)= if f ( b )=£°* (5*6) 

These definitions may be generalized. If/(#i, x 2 ,...x k ) denotes 
the joint probability function of the stochastic variables X lt X 2l ... 
X* the various conditional distributions may be given as 

f(x! | x 2 , x 3 ,...x k )= Jj~ Xz '— Xjcl if f(x 2 , x 3 ,...x k )^=0 (5*7) 

f(x 1 , x 2 I x z ,...x k )— -** ’‘- if f(x 2) x i3 ...x k )^0 (5-8) 

etc. 

f(x\y) [is read as the probability function of X given Y ; x/y is 
not a division but is only a notation). 

Ex. 5*13*1. For the example 5-12-2 obtain the conditional 
distributions of (a) X given that y=2, ( b ) Y given that x=0. 

Sol. (a) When y=2, f(x, y) is given as /(0, 2)=5/27, /( 1, 2) 
= 4/27,/(2, 2)=2/27 and f(x, 2)=0 elsewhere. 

g{y) at y=2 is 11/27 (evaluated in Ex. 512’2) 

••• 'w-v-WL 

= (5/27)/(ll/27)=5/ll for a;=0 
= (4/27)/(ll/27)=4/ll for ®=1 

= (2/27)/(l 1/27)=2/11 for z=2 
= 0 elsewhere. 

< J ) I * =0 >= w|,.o 

/(0, y) = 1/27 for y=l 
=5/27 for y=2 
=6/27 for y=3 
=0 elsewhere. 

/(*) L n =12/27 


27 = 0 
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9{y I #=0)s=(l/27)/(12/27)=:l/12 for y=1 
=(5/27)/(I2/27)=5/12 for y = 2 
= (6/27)/(12/27)=6/12 for y =3 
=■0 elsewhere. 

g[y | #=0) denotes the conditional distribution of Y given 

Ex. 5*13*2. Given the joint density function 

f( x > V)= f(»+2)e-*, 0<x<l, y>0 
= 0 elsewhere. 

Obtain the conditional distributions of (a) x given that y=l, 

(6) Y given that a;=1/3. 


Sol. 


CO m 

f{ x )=^U x +l)e- y dy = %(x+l)[e-vdy 
0 0 


= §(&+!), 0<a<l 

=0 elsewhere. 

1 


9{y)=\fi{x+l)e- v dx=e-yU{x + \)dx 
0 o 


= t~ v , y> 0 
=0 elsewhere, 

(a) f{x | y=l)=l&-p | y 


_ 1 _ l(^ + l)e - 1 _ 


9{y) ' - e 

That is, f(x \ y=l)= r 2(x+l)/3 for 0<a;<l 


2(* + l)/3. 


I 

l o. 


elsewhere. 


ft) 9{y 


r-im -/(** y) i * _i/o e-v.8/9 

x=l/3) - 1 ^ = l/3= =e~y 


8/9 


That is, g(y | a=l/3)= ^ e'* for y>0 

0 elsewhere. 


{ 


Comments. Here it may be noticed that the conditional 
distribution of X is the same as the marginal distribution of X 
whatever may be the condition imposed on Y; and similarly for the 
distribution of Y. This is due to the fact that X and Y are inde- 
P® nden t. Independence of stochastic variables will be discussed 

We will introduce one more illustration to bring out the 
l deas of marginal and conditional distributions. Suppose that a 
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person is throwing darts at a target. Let the target be the 
of a 12 x 10 rectangular board, as shown in Fig. 5-2. 6 center 



Fig. 5-2. 


Let us assume that even if he misses the target he will at 
least hit the board. If we take a rectangular co-ordinate system a <t 
shown in Fig. 5 2, any point of hit can be represented by a pair of 
numbers (x, y). If he has thrown the dart for a very large number 
of times then a good approximation to the probability that he will 
hit the region A (see Fig. 5 2 ) is given by the relative frequency 
(ratio of the total number of points in A to the total number of 
trials). This board in Fig. 5-2 is given by the rectangle {{x, y) | 

If this rectangle is divided into small 
squares, say of one square unit area each, by drawing lines paral¬ 
lel to the x and y-axes and if square pillars of heights equal to 
the frequencies (number of points in the corresponding squares) 
are erected over the squares, we get a two dimensional histogram. 
If this histogram is smoothed by a surface we get a surface of the 
form Z=f(x, y). 

Then, for example, the probability that the dart will hit the 
region, B={(x, y) | 0<x<2, 0<y <2} is proportional to 

2 2 

| j M y) dx dy. 
o 0 


If the total volume under the surface is assumed to be one 
unit then 

2 2 


P(B) =J | f{x, y) dx dy 


o o 

6 5 


j J f( x > V) dx dy =1 = | J f(x, y)dx 


-6 -5 


and 
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where, Q(omega)={(*, y) | -e <x<Q> 

the 1 board. ^ eVent because we assumed that he will at least hit 

j i W ° ?‘ V and ^ which assume the values x 

and y then the joint density function of X and Y is given by 

f{x, y) for — 6<a<6, -5<?/<5 

a n ^ f(Xj y)= 0 elsewhere. 

K we want the probability function f (x) ofX, that is, if 
we want the probability distribution of the co-ordinates of 

5 

the points of hit, this is given by, j /(#, y)dy since we have 

a . ssume( J ^at the Probability surface is Z =/(*, y). From 
ft \ ?^ imen a ^ata we can get an approximation to the curve 
J(x) as follows. Divide the interval (-6, 6) into a number of small 

Sa "pj- er I a S ‘ f Obtain the frequencies (number of points whose x- 
co-ordinates fall in the various subintervals) and draw a histo¬ 
gram y erec ing rectangles whose areas are proportional to these 
requencies and smooth this histogram by a curve. If we want 
the distribution of all points of hit whose y co-ordinates are 2 (or 
the distribution of the points of hit along the line y =2 as sho wn 

m i^ig. 5-2) this is known as the conditional distribution of 
X given y =2 and is denoted by g[x | y= 2). 

• ■ * 51 i D^tributfon Function. If f( Xl , x 2 .**) is the 

joint probability function of the stochastic variables X^ X 2 . X h 

then the distribution function on the cumulative probability 
function F(«i, a 2 , .,a fc ) is defined as, 

F(ai -“*'. s ^ S -s f(x llXi ,...x k ) 

TfY -a=<a; s <a 2 -a><:c fc <a fc 

11 A 1( X 2 ,.Xfc are discrete 

a l °2 «7c 

=| | .| fi x i> x 2 ...x u )dx x , dx 2 ,. dx k (5.9) 

— oo —. oo — oo 

if Xi, X 2 ,......X fc are continuous. 

It may be noticed that F(oo, oo,., oo) = l analogous to 

i (°°) = 1 m the univariate case. F(oi,.. .a k ) = 0 if any a f *= — oo. 

Ex. 5.14.1. For the following probability distribution 
f{U l) = l/8,f(2, l)—3j8 
f(l 2) = 2/8, f(2, 2)=2/8 

md f{ x > y)=0 elsewhere, find (a) F(0 , 2), (b) F(1 , 3 ), (c) F(3, 5). 
Sol. (a) F(0, 2) = E £ /(a, y )—o 

— oo<JC<0 —oo <1/^2 

(Since f(x, y) = 0 except for x=\, 2 and y= 1, 2) 

< J ) F(l,3)=/(1,1)+/(1,2>+/(1,3) 

= 1/84-2/8 + 0 =3/8. 
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(c) F(3, 5)=/( 1, 1)+/(1, 2)-(-/(2, I)+/(2, 2)==i. 

Ex. 5.14.2. For the following probability distribution 

f(x, r y, z)=x . e- y -*/ 2 , 0<x<l, y>0 z>0 
=0 elsewhere, 


find F(a, b, c), where a , b, c are constants >0. 

a b c 

Sol. 


F (a, b, c)= | | f(x, y, z)dx dy dz 


— co — co — oo 


a b c 


=0+j* | | x e’ v ' e l 2 dx dy dz 


a 


0 0 0 
b 


dz 


=| x . dx J e~ y dy e~ e l 2 

ooo 

= a 2 (l—e" 6 )(l—e -c / 2 ). 

Comments. Evidently if a> 1, b—oo and c = oo then 

1 CO CO 

• r r 

X e~ y - z l 2 dx dy dz =1 


F {a, b, c)= J | 


0 o 0 


Exercises 

5.1. A box contains 3 red marbles and 5 white marbles. Three mar¬ 
bles are taken at random with replacement. On the outcome set of this 
experiment define two bivariate probability distributions and obtain the 
respective probability functions. 

5.2. Consider an experiment of rolling a balanced die twice. Let X 
and Y be the sum and difference rolled ( i.e X=sum of the numbers appear¬ 
ing at the two trials etc.) Obtain the joint probability distribution of X and 
Y. Evaluate the distribution function. Obtain the marginal distributions 
of X and Y. Also, obtain the conditional distribution of X given that y= 4. 

5.3. Given the following bivariate probability distribution, 
/(-1,0)=1/15, /(-l, 1)=3/15, /(-I, 2)=2/15 

/(0, 0)=2/l5,/(0,1) =2/15,/(0, 2) =1/15 

/(l, 0)~1/15, f(l, 1) =1/15,/(l, 2) =2/15. 

f( x > y) aa 0 elsewhere 

obtain (1) the distribution function, (2) the marginal distributions of X 
and Y, (3) the conditional distribution of X given y=2. 


MULTIVARIATE DISTRIBUTIONS 


183 


5.4. G iven the distribution function. 


o for 

*<0, 2/ < 0 



1/12 

a?=0, y«0 ! 

[■6/12 for 

*=1, J/ = l 

3/12 

8 

II 

►—» 

II 

o 

7/12 

35=0, y=2 

4/12 

1 

*«0, j/ = l 

.1 



distribution S^g^ven ®=o! Utl0n ° f X and Y> Also obtain the conditional 


5.5. Given that 

/(», y, z)- f k(ai+l)(y+ 2 )(z+ 8 ), 0 <*< 1 , 0 < t /< 1 , 0 < 2 < 1 , 

(^0 elsewhere 


and & is a constant, is a density function Obtain n\ z. t 9 \ ■ i 

distributions of X, Y and Z conJit.Wi a •? -L J 1 ) *» i 2 ). the marginal 

conditional distribution of Z given x=a and 

5.6. Given that 

/(*, V)= Cjc(tx+3)e-viifoT 0<*<2, t/>0 

lj) elsewhere, where Ic is a constant, 
is a density function. Show that /(«, y)=f(x) . g(y). 

5.3 &nd 5.b. Verify tha?°’ f ° r the probabilifc y distributions in problems 

P (*>0, l}=l_P{ a;< o, 

5.8. Given the density function 

f{*,y, z) = 8xyz for 0<»<1, 0<?/<], 0<z<i 
= 0 elsewhere, 

obtain P{0<®<0.5, 0.5<«/<l, 0.5<z<0.75). 

function oilmen 4 Xas 9 " Sity fUnCti °" ° f X “ d 4,16 c “ditional density 

/(«)=§(* + !) for 0 <o;< 1 
= 0 elsewhere 

g(y, x) 


and 


{: 


e~ xy for y> 0 
elsewhere. 


obtain P{ 2 /> 2 ). 

Motion o?' X g^en y he a8 density function of Y aI >d ‘he conditional density 


ffly) 


{ 


\ji for 0 < 2 / < 2 
0 elsewhere 



184 


INTRODUCTION TO STATISTICAL MATHEMATICS 


and 


fix | y) = 



y for x>0 

0 elsewhere. 


respe ctively, obtain P{1 < 

5.2. MOMENTS 


The rth and sth product moment about the origin of two 
stochastic variables X and Y is defined as, 

/i' rf =E(X f Y*) (5.10) 

=2 2 x r y* f(x, y), 

x y 


if X and Y are discrete. 


-11 

x y 


x r y s f(x, y)dx dy, 


if X and Y are continuous, where E denotes 'expectation’. 

The r th and s th central moment of X and Y is defined as 

//. rs =E[X~/**] r [Y—(5.11) 

=2 2(x — yx) 1 (y Pv) 3 f(%> y)> 

X y 


if X and Y are discrete. 

=J | {x—Px) r {y— hYRx* y) d * d y> 

x y 

if X and Y are continuous, where ^a,=E(X) and /i„=E(Y). These 
definitions can be generalized for a ^-variate case. 


5.21. Covariance 

/an=E(X—/**)(Y— y v ) (5.12) 

fin is defined as the covariance between X and Y. Evidently 
when Y is the same as X, the covariance becomes the variance of 
X. Other notations for the covariance between X and Y are 
C(X, Y) or C(Y, X), axy, Cov (X, Y) etc. It may be easily seen that 

<7 !cy == E(X y x )(Y —^)=E(X.Y)— y-x • Pv 

(6.13) 

=E(X . Y)-E(X). E(Y). 

Ex. 5.21.1. For the probability distribution in Ex. 
find the covariance between X and Y. 

Sol. Here f(x)= 3/8 for x=l 

= 5/8 for x—'l 

=0 elsewhere 
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and ?(y)=4/8 for y=l 

= 4/8 for y =2 
— 0 elsewhere. 

i E (X) = 1 .(3/8) + 2 .(5/8) = 13/8 

E(Y)=l.(4/8)4-2.(4/8)=12/8=3/2 
E(X.Y)=(l.l)(i/8) + (i.2)(2/8)+(2.1)(3/8) 

+(2.2)(2/8) 

= l.(l/8)+2.(2/8)+2.(3/8) + 4.(2/8)=19/8. 
(Here a dot means a multiplication). 

cray=E(X. Y) — E(X).E(Y)=JJ* VJM 

= -1/16. ' 8 ' 

• ^. was seen that the standard deviation is a 
rov»riWp f n d f l Tv 11 a Un 1 lVariate distribution. Similarly the 
nersion in a hiv ^ f ma ^ ta ^ en as a measure of joint dis- 
,u “‘ 'JP' *";"-;.n. joint r.ob.kiiit, 

! 7f ln+ V then - the covariance may be expected to be posi- 
? Vtr C °T ariance ma y be negative or zero. When X 

e in ependent a xy =0 and this will be discussed later. 

t}lp ,f 5 ’l 2 ' fl v° rrel ff io “ C °efficient. The correlation between 
the stochastic variables X and Y may be defined as 

p or p„= _2£1_ E[X—E(X)lrY— E(YVI 


'X' 


{E[X- E(X)]n<:[Y - E(Y)P}i 


' U V — /J ) 

if <s x and a 0, where p (rho) denotes the correlation coefficient 
and cr* and a y denote the standard deviations of X and Y resnpp! 
ively. It may be noticed that p xy is a pure coefficient i e it is 
independent of the units of measurements, scaling etc ofXanri 
Y since we divide the covariance by the standard deviations of X 
and Y . p xy is a measure of the relationship between X and Y and 
p ays an important role in correlation analysis in Statistics T+ 
can be shown that — l<p<l. This correlation coefficient is called 
a linear correlation coefficient or a simple product moment corre 
lation coefficient. There are other types of correlation coefficients 
such as partial correlation, multiple correlation, serial correlation 
hiserial correlation, curve linear correlation etc., which will not be 

^ml U r S !? h ^ e - F ° r further readin S see the references given at the 
-na ot this chapter. 

y_ Ex. 5*221. Two stochastic variables X and Y are given as 

— aX-\-b where a and b are constants, show that | p I ~i w Ji ere 0 
is ^e correlation coefficient between X and Y. ’ 
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Sol. Y=aX-f6. 

E(Y)=a.E(Xj+6 and Var. (Y)=a 2 Var. (X) 

Y—E( Y)=a[X—E(X)] 

Cov (X, Y)=E[Y-E(Y)][X-E(X)]=Ea[X-E(X )]2 

— a. a 2 where o 2 = Var. (X). 
i i 


Cov (X, Y) a • a i 

{Var (X). Var.lY)) 4 {<j* a 2 g‘ }* 


— il* 


(Since the standard deviation is defined as the positive 
square root of the variance.) 

I P I =1- 

Comments. It is seen that when there is a linear relation¬ 
ship between X and Y, | p | =1 or there is perfect correlation. 
In the light of this result p may be considered to be a measure of 
linear dependence between X and Y. 

Ex. 5*22*2. Given that the joint density function of X and Y 
as 

K x > y)= C*~ x ~ v > x>0, y>0 
C 0 elsewhere 

evaluate the correlation between X and Y. 


Sol. 


E(X^ g ][Y— fly] E(XY)—E(X).E(Y) 


G x Gy 


G x Gy 


w w 

HereE(X.Y) =J j xy e~ x ~ v dx dy 


0 0 


=| x e~ x dx | \y e~v dy. 


E(X).E(Y) = | x e~ x dx | y e~ v dy 
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(Since /(*)« r e~ x for *>0 and g(ij)= 

l 0 elsewhere 
=E(X, Y) - E (X). E ('Y)=0 

Comments. For this particular example it is seen that 
p = 0, because a xy =Q. It so happened that in our example p=0, 
bu m general p need not be zero. In this example X and Y are 
independent and hence p=0. Independence of stochastic varia¬ 
bles will be discussed later. 

, v 5,*23. Conditional Expectation. Conditional expectation. 

r denoted by E(X j Y). This gives the expec¬ 

tation ot a in the conditional distribution of X at the given point 
tor x. 1 he conditional distribution of X given y=a is usually 

denoted by & y j 

f{x 1 y)= ^sW at y=a ’ if s( “ )#0 - 

Therefore the conditional expectation, 

CO 

E(X | Y) = | x.f ( x | y) dx, if X and Y are continuous. 


— ^ x f( x I y)> if X and Y are discrete, 

X 

(5*15) 

In general the conditional expectation of a function of X, 
say i p(X) may be written as 

E^X) | Y}= J \}/(x).f(x | y)dx, if X and Y are continuous, 
x 

(5-16) 

—2 ip(x) f(x | y), if X and Y are discrete. 
x 

These definitions may be generalized to a general multi¬ 
variate distribution. We may notice that E(X | Y) is not a s.v. 

put is only a function of y or a function of a if the condition 
18 given as y=a. 

Ex. 5-23-1. For the following bivariate distribution find the 
expectation of X 2 given that y=l. 

f[l, l) = l/10,f(l, 2)=3/10 

f(2 , 1) = 1/10, f(2> 2) =1/10 

f(3, 1)=2/10, f(3, 2)=2/0 

an ^ f(v, y)=0 elsewhere. 


{ 


e~ v for y> 0 
0 elsewhere 
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Sol. The marginal distribution of Y is obtained as 
0f(y)=4/lO for y=l 
=6/10 for y=2 
= 0 elsewhere 
g(y)=4/10 when y=l 

f(x\y)^~^~a,ty=l 
n I U> g[y) 

= 1/4 for x = 1 
= 1/4 for x=2 
=2/4 for z = 3 
— 0 elsewhere. 

' • 

The conditional expectation of X 2 given that y = l is 
E(X 2 | Y)=l 2 (l/4)+2 2 (l/4) + 3 2 (2/4) = 23/4. 


Ex. 5*23*2. Given the joint density of X, Y and Z as 

f(x, y, z)=£ x e~v~ z for 0<x<2, y>0, z>0 
= 0 elsewhere, 

find the conditional expectation of Y given X and Z. 

Sol. The conditional distribution of Y given X and Z is. 


g(y | x, z) = 


fix, y, z) 
h(x, z) 


x e-y~ z j2 

oo 

j" xe~ v ~ z dy!2 

o 


f 


e y for y > 0 
0 elsewhere. 


where h[x, z) is the marginal distribution of X and Z. Therefore, 


00 CO 

E(Y|X,Z) = 

Comments. The conditional expectation of a s.v. X is the 
E(X) in the conditional distribution of X. The conditional distri¬ 
bution is illustrated by two examples in sections 5.1 and 5.2. 


) 


y g[y ! X, z)dy = r ye dy =1 


■ 

a 


Exercises 

5.11. Find p/ 2# , p/ 02 , ft' ix and il ' 12 for the following distribution : 
b 0)=l/8,/( —1, 1)=2/8, /(l, 0) = 3/8, /(], 1)=2/8 and f(x, y) = 0 elsewhere. 

5.12. Obtain |x r 2> , n' n and for the following distribution : 

eSyl2 f or o < a: < 2, y > 0 

0 elsewhere. 
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and Y in problem 5.12. ^ °duct moment correlation coefficient between XT 

5,14m If C is a constant o nf i y , _ 

that (1) E(C | X)=C, (2) E(CY | X)=CE(Y^| X) Stochastic variableB - show 

(1) Ef(X+Y) | Zj=.E(X | Z)-|^E?Y ^7 variables with finite means show that. 
E[Var (Y j X)]-f.V a rfE(Y i Xn when ^ ’ & E[E(Y 1 X )]=E(Y), (3) Var Y= 
function of a stochastic vJriableTx m (2) &nd (3 ^ E ( Y I X ) is treated as a. 

coefficient r is such that — 1 ^ bisection 1,52 show that the sample correlation. 
5.17. Using the result +>. + 

never negative, prove that —1 Jjr® variance of a stochastic variable is- 

1 ^ P < 1 by taking 


Var (Y). 


- Y 


Y \ 

+ °2 ) and Var where CTl 2 =Var (X) and <t 2 2 =- 

rot the ranks ob- 
n 

6 2 df 

r=l- i=1 - r 

»(»•-!, where d f is the difference of the ranks of the 

student. 

1 to » in some order 1 *.) ^ U ° tleS ‘ ^ Hint: Tb e ranks will be numbers from 
5.19. Obtain the rank correlation between the following ranks. 


X 

1 

1 

i 

3 

2 

4 

6 

5 

y 

2 

5 

3 

1 

4 

6 


5.3. SPECIAL MULTIVARIATE PROBABILITY MODELS 

In this section only a few of the important mi ,it; • . 
tributions will be discussed In a univariate vanat e dis- 

that the Binomial distribution is the most import* J laYe seen 

tani 1“ and that the nor “ al distribution U X most Im”*® 
nt continuous distribution. Analogous to Binomial m ? or ' 
nate normal, we will discuss Multinomial and ^7 a ' 

a distributions in the following sections. ia e ^ or ~ 

dent 5 ? 1 ’ The ,Multinomial Distribution. Consider an 

nt where each trial results in one of the k mntnnii n e ?P e , ri ~ 
utcomes with probabilities p X) p 2) . p k> where y exclusive 

E k 

4=1 


Pi = 1. 
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jt?ot example if a blanoed die, is rolled once one and only one of 
the numbers 1, 2, 3, 4, 5 or 6 is obtained Each occurs with pro¬ 
bability 1/6 and 1/6 4-1/64-...+1/6=1). If such a trial is repeated 
n times what is the probability of getting exactly x 1 outcomes of 
the first kind, x a outcomes of the second kind,... and x k outcomes 
of the kind such that Xi-\-X 2 ’^~••• ~hXfc = n ? This can be easily 
evaluated by a procedure similar to that in the case of a Bino¬ 
mial distribution. The probability of getting x x ,...x 2 ,...,x k in some 

specified order is evidently p x Xl P^ •• -Pk X]c ' Therefore the probabi¬ 
lity of getting exactly x k outcomes of the i tf> kind for i = 1, 2,...k is 


n 1 X l % 2 Xj c 

f(x l3 x 2 , ...X k 9 )=— , , . Pi Pi ---Pk (5.17) 

This may be considered to be the joint probability of the 
stochastic variables Xi, x a ... x k where x { is the number of outcomes 
of the i th kind for i=l, 2 ,...k. 0 < a:* < w for i—1 , 2 ,...n and 

-*i+«2+--+ a; *=»&, ^i+^2+---+p&=l. Pi > 0 for all i. 

This distribution has the parameters p lt p 2 ,...p k and n. But 
i?i+J32 + ---+Pfc=l. Therefore there are k parameters, n and p lt 
P 2 * * 90 Pk- 1 * 

It may be noticed that 2 2 ... 2 f(x lt x 2 ,...x k , 6) 

X 1 X t x lc 

= {Pi+Pi J r — J rPk) n =l' (5.18) 

Here Xi, X2, ...,Xfc are said to have a multinomial distribu¬ 
tions with the parameters n and p lf p 2 ,...,p k where 2p { =i 1. 

Ex. 5.31.1. It is given that in a community the probabilities 
of getting a person having black hair, brown hair, and blonde hair are 
0.50, 0 30, 0.20 respectively . If 10 persons are selected at random 
from this community what is the probability of getting 4 having black 
hair, 2 having brown hair and 4 having blonde hair ? 

This may be compared to a multinomial probability 
situation with 10, Pi=0.50, p a =0.30, and p 3 =0.20. Therefore 
the required probability is 


S(i ' 2 ‘ 4)= rmrr (°-bo)‘(o. 30 )*(o. 20 )‘=0.02s. 

5.32. The Bivariate Normal Distribution Tf v j 

frr„7rr™ t0ChaStlC Variables the-joint densi, 
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(5.19) 


' • “ / \ K2 / J 

for — oo<^,x<C*oo, ■—-oo <cv<Cna Q v. a n 

< '2/<oo, p 1>0j p 2>0> _ 1<s<1 

ihen X and Y are said! to have TV T°° <ai<00 ' -°o«* s < oo. 
Here «i, a 2 , p lt (* 2 and 8 are parame+1 nornial distribution. 

E(X)=«, Em _“ P a “ C - Later we wil1 show that 

tribution of X is obtained ^ Stributlou of X. The marginal dis- 


00 


f(X, 0 )=j /(^ y } Q) dy 

■— OO 


Put «=^iL> and v= y^^ dv= 


Pi 

■ • /(*» y, d)= 


2 *PilW T^r 


= dy_ 

02 
1 

2 (T^s*y |> 2 — 2§uv+v 2 ~\ 


(5.20) 


/(*, *) = 


oo 


-:_f 2(1 -$*i [“ 2 —2 Sm»J-(,2i 

2*PiP,V 1-«* J c .P,.* 


U J 


e 2(1—S 2 j - [« a —2Swn 

=Pi ^vT^if J e 2TTr * i > 7 .<*„ 


OO 


Hut 


oo 


■•i 


00 


s' 2 —2Suv+SV~SV=(v—Su) s —S 2 u 2 


%T=S*j 




(5.21) 


Hut 


S*«2 


CO 


= 6 


2(i-a a ) f ”2(iZsi,( v -H 




00 


Sm 


VI ~S 2 


=t=>dv =V1-S 2 rff. 


(5.22) 
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• • 


S*u 2 

2 (1-8*[ 


ms 

f 


2 ( 1 - 8 *) (V 


e 


dv 


= v / l-5 2 e 


8*m* 

2 ( 1 - 8 *) 


CO 




= Vl-8 2 • « 

u 1 


8 2 u 2 


f{x, 8) = 


2 ( 1 - 8 *, 


Pi27t\/1“8 2 

«-«*/ 2 1 


^__ ___ 8 2 w 2 

\/l—V27r e 


1 / a?+« t V 

IW 


PiV 2 * 


OO < £< Oo 

(5.23) 


i.e. f(x, 0) is a univariate normal distribution. 

In the case of univariate distribution it was seen that 
ai=E(X) and £h = standard deviation of X. 

From the symmetry of f(x, y, 6) it follows that the marginal 
distribution of Y is 


g(y> 0)= 


-\/2tz (3 : 


•5r’- 


oo<f/<oo (5.24) 


5.32.2. Covariance between X and Y. 

g xv = E(X— ai)(Y—a 2 ) 

1 

X 


CO CO 


II 


— CO — oo 




PiP 2 27rVi-S 2 


dx dy. 


(5.25) 


Put 


u = 


x—a.1 


Pi 


v= - y ~^~^dx dy=^du dv 


(See Section 5.5 for change of variables. 


and 
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__ ftlfo 
a *'>"2izVi 


uv e 


'2(1—8 2 ) ‘ u *— Z&uv+v*] 


du dv» 


CO 


0102 


271^1 


bj • 


_ V 2 

^(i_“S2y 


CO 


oo 


a ~27 l-S^ [M2 - 2Su ^ 1 

« • « 1 1 duYLv 


oo 


Q a 7 v * S 2 V 2 


- GO 


OO 


[1 


u e 


"~2(l_ 8 2 ) ( W-Sv ) t 


OO 


dwjdv (5*26) 


But 


c 

I 


u e 


2(1^7 (w - 8y ) : 


OO 


du 


— OO 


= 8.v J 


2(1 —S 2 ) 


dt 


oo 


= S . Vy/l —§2 . V2tc 


a xy — 


j j_- Pi . 

V 2 


oo 


jJW 2 

2 tt J *' ' 


e dv = 8 . Pj . p. 


oo 


Therefore the covariance between X and Y is 8 . Bi . B, and 
the correlation coefficient between X and Y is 


P = 


^ • Pi • P2 


<^a; Pi • P2 


‘-= 8 . 


(5-27) 


Because of these properties the parameters in a Bivariate 
normal distribution are usuaily denoted as p lt p it <j lt a 2 and p. 
thus the joint density function of X and Y is written as 


f(x, y, 6) — 


exp 


aia 2 2n -y/l p a 2(1 


I _f fX—ft A 2 

-p 2 )Lv ; 


(5*28) 


OO< x<oo, — 00 <y<oo, (7i>0, o 2 >0 and —l<p<l 


— oo</^i<oo J — oo<f* a <oo, 

* here /12, Oj, o 2 and p are parameters and ‘exp.* denotes 
ex ponential\ It is seen that 
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p 1= E(X), ft 2 =E(Y), a* .-=Var(X), a* =Var (Y) 
p=rCov (X, Y)/oi . o 2 . 

5.32.3. The conditional Distribution of X given Y. The 

conditional distribution of X given Y is by definition, 

My)=i iw iSg{y) ^ 

where y is a given quantity. We have already seen that the mar- 
ginal distribution of Y is N(/m 2 , <t 2 ) when the joint distribution is 
given with the parameters pi, p 2 , <7i, u 2 and p. 


f( x I y) = 


exp 


GiGT 2 27U\/1 — p 2 

-* (=?)■ t 


i r/ *-pi \ 

2(i— p 2 ) Lv ci ) 


y— y 2 
<r 2 


<*1 

M^)l (5 - 29> 


divided by 


elp (-'T57-( JL ^ t ) , j 


<7 2 V 2tC 

This upon simplification will be easily reduced to 

1 1 


/(* I y)= 


<J 1 '\/2k y/1 — p 2 


exp._ 


2(1 -Pf 


p ■ (y—/* 2 ) 

__U 2 _ 

<*1 

—00 < a; < 00 . Hence y is a given quantity. 
Evidently/(a; | 1 /) is a normal distribution with mean 


(5.30) 


==/fi - j _ P-(2/ Pi) and with standard deviation equal to 


G !\/1 — P 2 . 

Similarly it can be easily shown that g(y | x) is a 

pi), a 2 \/i—p 2 y 

If p=0 evidently f(x | y)=f(x) and g(y \ x)=g(y). 

5.32.4. The Conditional Expectation. In a Bivariate 
normal distribution the conditional distributions of X given Y as 

well as that of Y given X are given in section 5.32.3. It is seen 
that 
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f(x | y) is normal with mean 

c 2 

an d with standard deviation 

= 

Therefore, 


E(X | YJs^+p. _5L(y_^ 2 ) 


and 


<*2 

E(Y | X)=^ 2 4-p. -^-(a—/ij) 


(5.31) 

®- ere ^ 2 ‘. P* CT i an< i cr 2 are constants and hence if the con- 
ditionai expectation of X given Y for various values of y, is 
plotted evidently we get a straight line. Similarly the conditional 
expectation of Y for various given values of a defines another 

8 1 -it 16 !. ^^ ese li nes are called the normal regression lines 

which will be discussed in detail in the last chapter. Since the 
correlation coefficient p=a 12 /a 1 n 2 where c 12 is the covariance bet¬ 
ween A and Y, these regression lines may be written as : 


and 


I Y)— yi+oi^y— /Lfc 2 )/o , 2 

E(Y | X) = M 2 -(-a 1 2{x—[mi)/gi 


(5.32) 


, *7 ^ Rockets are fired from a rocket launcher to hit a 

bridge 2 units long. If the mid-point of the bridge is taken as the 
origin an d the direction of the bridge as the x-axis the distribution 
of the points of hit is given to be bivariate normal with the para¬ 
meters p 1 = 0 =n 2 , <7i —2, g 2 -3 and p = 0. What is the probability 
that ( 1) two out of three rockets fired will hit the bridqe, (2) a rocket 
fired will hit the region — 

Sol. The joint probability function is given to be, 


f( x > y) = - 


2tc.2.3 


-(x 2 /4 + j/ 2 /9)/2 


— 00<X<00, — 00<?/<00 


In (1) we want (») ,» ( 1 _ p )i where p-probability of a hit 

-P( 1 


where 2c is the breadth of the brid 

1 e 1 


ge 


»)<*») ffe-Jjy 1 


—.t 2 /S 

C dx. 


—1 _ 


-I 


f 1 —y 2 /i8 , 

J 3^2ti d y 
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1/2 


c/3 




-«V2, , f 

e dt J 


—»*/ 2 _ \ 

« dt l\'2n J 

c/3 


= 0-3830 ( 2 j c * */V2^ j 
x 0 


In (2) we want P(—2^y^2) 


1 2 


|Q /(«> ^ dx 


-1 —2 
2 


r -**/ 8 r _ r -»*/ 18 _ 7 

= 1 « /2V2rc cfc.J e /3v/2/r dy 

-1 -2 

1/2 2/3 

= / 2 J e~ Zt/2 IV2ir dz^ 2 j 
0 0 


-*•/2 

e df !\/2tz J 


=0-1895 approximately. 

(from normal tables). 

5*32*5. The bivariate normal surface. The univariate 
normal curve N(/a, a) was seen to be symmetric about the ordinate 



/ 


Fig. 5.2* 


-«lo'« Vi lf " dinate at •-** and extending fron 

1^! 
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jurface denaitv^f 6 n ^ ma ^ su rface meaning the surface 

Refined ^™**™**™'too* of a Bivariate normal distri- 

Sn Kg. 5 2 and 5^ 6 ** the point ^ *> and whichis 



W * 

5*33. The Multivariate Normal Distribution. The den¬ 
sity function in a Bivariate normal distribution can be written by 
using Matrix notation as follows : 


f, x „ f) I A I la e -i(X-rtA(X-^ 

ft ' (2tt) 2/2 


(5-33) 

where X— l x = (x—y 1 , y—n 2 ) 

is a vector of order 2. (Here X does not mean that X is a s.v. 
This is only a convenient notation here.) 

(X— p), is the transpose of X—/x. 

1 -p 


A = 


(1-pV <*!<*,(i~p 2 !' _r<Ti ! 


-p 


i 


T 1 

°2 2 _ 


(71(72(1 -p 2 ) a a 2 (l — p2) 

“V- 1 (say) 

is a matrix of order 2 and | A | is the determinant of this square 
matrix A. V is called the covariance matrix. The theory of matri¬ 
ces and determinants is given in chapter 1. A k -variate normal 
density function may be written as 

f{xi, %2,'"X ]C , 6)— ^7cjfc/a" e > (5*34) 

X p— (a?i jj, i, ^ 2 »••••£&—pjt) 

(X— fi), is the transpose of X—p. 

( iipft where V is a square matrix of order k and whose 

' dement is the covariance between the stochastic variables 
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AT ICS 

y ond X This means that the diagonal elements of V arA 
variances of Xi, X.X, and the non-diagonal elements'^ > 

various covariances. (X-^)A(X-^), is assumedto be a posit‘d 
definite quadratic form or A is assumed to be positive definite ( 8ee 

chapter 1) 

Exercises 

5-20. The moment generating function in a multivariate distrib U f 
may be defined as l0a 

(*i» e 

7) 

Show that (1) E(X i )=—g^ _ a t ^i=— e =i & =aO 

7) 7) 

(2) E(X i X J )=- fj , ~pif7 '^X 1 ...X k ( t at q 


5-21. Obtain the moment generating function for a multinomial 
distribution and evaluate the product moments fj. 10 , p. 20 , |x' 02 , n' zi between 
the i th and the j th variables. 

5-22. A bivariate normal distribution has the j^arameters 
p 1= 30, p 2 =20, g x =2, ct 2 =3 and p=0, obtain 
P{20<» 1 <35 15<®,<25}. 

5*23. Iff(x lt x t , x 3 ) is the probability function for a trinomial distri¬ 
bution show that the conditional distribution f(x 1 , x 2 | x 3 ) is a binomial 
distribution. 


5*24. The exponent of a bivariate normal density function is 
-#[(*-10) 8 /4-(»-10)(y-15)/6 + (y-15)V9] 

Find out the parameters fi 2 , CT i> a t and P> if they exist. 

5-25. Shri Chacko is shooting at a target. Let x and y be the co¬ 
ordinates of a hit, taking the center of the target as the origin. Assuming 
that (X, Y) has a bivariate normal distribution with the parameters, 

F-i=0=p. 2 , Oj — l, cr 2 = l, p = 0, what is the probability that he will hit 
the target (1) if the target is a square with sides equal to 4 units, (2) the 
target is a circle of radius 3. 


5-4. INDEPENDENCE OF STOCHASTIC VARIABLES 


Two stochastic variables X and Y are said to be independent 
if their joint probability function is the product of the marginal 
probability functions of X and Y i.e according to our notation 

f(x, y, 6)=f(x, d).g(y, 6) (&$&) 

where f(x, d) and g(y, 6) are the marginal probability functions of 
X and Y respectively. 

It is easily noticed that if X and Y are independent 

ft* I y) =ft x ) and 9{y ! x)=g(y)- 

Ex. 5*4*1. Check whether X and Y are independent, if tht 
joint density function is given as 

f( x > y) — T (x-\-2)e~y, 0<x< 1, y>0 
— 0 elsewhere. 
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Sol. /(*) = ] r (*+2) e-» (a . +2) 


0 

= S(a:+2) for 0<z<l 
=0 elsewhere! 

l 


e~* dy 


M -j i{ x + 2)e~v dx = e-y[ | (z-f-2) dx 
0 o J 


= s C" y for 2 /> 0 
=0 elsewhere. 


Therefore f(x). g(y)=i( x + 2) y ). 

Hence X and Y are statistically independent. 

Comments. Here it is seen that /(.r | y)=f(x) 

This definition of independence maj be generalised for a 
number of stochastic variables. The stochastic variables 
Ai, A*..., X k are said to be independent if their joint probability 
function equals the product of their marginal probability func¬ 
tions. That is, 

f(xi, ar 2 ,..., x k ) =/i(a:i).. -f k [x k ) (5 36) 

where /i(ai),..., f k [x k ) are the marginal probability functions of 
A 1} X 2 ,... f X; c respectively. 

Ex. 5*4-2. In the following distribution ( 1 ) check for indepen¬ 
dence of X and Y, ( 2) obtain the conditional distribution of X given 

(3) obtain the conditional expectation ofXqivenv = \ (4\ 

obtain P[0^x^2\y=\). “ 2 ' {) 


fi x >y)=C 2, 0<x<l, 0<x<y<l 
0 elsewhere. 

i Sol. The region where f(x, y) has non-zero values is given 
y the shaded area in Fig. 5.4, 


S 



Fig. 5-4. 
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co oo ^ y 

Evidently f f f(x, y) dx dy= J( j 2 dx ) d V 

* ■ A A 


- 00 — CO 


0 0 

1 1 


=J(J 2 dy J dx= 


0 x 


Here x varies from 0 to y and hence the marginal probability 
function of Y is, 


oo 


y 


g{y )=| f(x, y) dx= J 2 dx=2y. 

- CD 0 


Along the ?/-axis y varies from 0 to I and therefore 

9(y)=C 2 V> 0<y<l 

elsewhere. 


r 

l 0 


Similarly the marginal distribution of X is, 

CO 1 

/(a;) = J f(x, y) dy= j 2 dy=2(l— x). 


CO 


X 


Along the #-axis x varies from 0 to 1 and therefore, 
f( x )= C 2(1—x), 0<a;< 1 

elsewhere. 


C 


(1) But f(x). g(y)=2(\—x)(2y)^f(x,y) =2 and hence X and 
Y are not independent. 

(2) The conditional distribution of X given y= \ is, 

2 


f (x | v —x)— y} 
HX] V 8(9) 


V=i= Ty 


9=i= 2. 


But along y=\, x varies from 0 to Hence the distribution 
of X along y=\ or the conditional distribution of X given y—\ 


f(x | y=\)= ( 2, 0<x<i 

elsewhere 


f 2 

l 0 
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(3) E(X I 2/ = £) = J xf{x I y=\) dx= | a: 2 dx=\. 

— CO 0 

2 

(4) P(0<*<2 | */=£)= j f( x \ y=\) dx 


\ 2 

= | 2 dx-\- | 0 dx= 1. 

0 i 

Ex. 5.4.3. 7/ s.v.’s X and Y are independent, show that for 

any two events A={x | a<x<b} and B={y j c<y<dj, 

P(AnB)=P(A).P(B). 

Sol. Let f(x, y), f(x), g[y) be the joint probability and 
marginal probability functions of X and Y respectively. Since X 
and Y, are independent, f(x, y)=f(x).g(y). For convenience we 
will assume that X and Y are continuous. The discrete and 
mixed cases can be dealt with in a similar way. 

b 

P(A)=P (a<x<b)= | f(x) dx 

a 


° b d 

p ( B ) = j 9{V) dy and P(AnB)= f( x , y) dy j dx 

a a c 


b d 


=|(| /(*)-9 r (2/) dy^j dx 
a c 


(since X and Y are independent) 

b d 


= | f{x) efa. j g[y) d?/=P(A).P(B). 

a c 


in Stochaatlc independence of events was defined 

the ranot r S ‘ n ° e f.“ event can be considered to fie a subset of 
obtain J ° f a S ' v ‘ a 1 tbe reaulta obtained in chapter 2 can be 

irn'opr1at: s !,?r ial ° aSeS fr ° m the g eneral P ro P^ a of so ml 

a bies and if I aiM * T are . * nde P enden h stochastic vari- 

then, 1 ^ (X) and *( Y ) are functions of X and Y respectively 


E['I'(X).^(Y)]=E[i/'(X)].E[^(Y)], 


(5-37) 
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302 t . (mathematical expectation’, and *(X) and * (Y) 

r^assumcdto be stochastic variables. 

_ r 0 f "v” and Y be continuous. 

Proof. Case 1. Let X an a i 


E[ 1 f(X).^(Y)]=j j t(x)4(v)K x ’ y) ix dy 

_no 


CO 00 

=JJ <n») m /(*)•»(») dj; ^ 

- 00 “ " (since X and Y are independent) 


=J /(*) dx j M *( X )]- E W Y ^' 

— GO — 00 

The proof when X and Y are discrete, is left to the reader. 
It is easy to show that Eft/^X, Y) + ^(X, Y)]=Ei/'(X, Y)-f-E^(X, Y) 
where i p and (j> are functions of X and Y. 

Corollary 1. The covariance between two stochastic vari¬ 
ables X and Y is zero when X and Y are independent. [The 
converse, however, need not be true in general. If X and Y are 
normal and if Cov. (X, Y) is zero then X and Y can be shown to- 
be independent]. 

Proof. Cov.(X, Y)=E[X—E(X)][Y—E(Y)] 

=E[X—E(X)].E[Y-E(Y)] 

(by theorem 5*1) 
= 0 (since E[X-E(X)] = 0 = E[Y-E(Y)] 

Exercises 

5.26. Show that, in a bivariate normal distribution the variables are 
independent if and only if p=0. 

5-27. Check whether the variables in the following distributions are 
independent. 

(a) J(x, y, z) = r 2 e~ x ~v-z f or a;>0, y> 0, z>0 
f 0 elsewhere. 

(&) fW>V)=C (*+l) (2y + l)/9 for 0<a;<l, 0<i/<2 
0 elsewhere. 

5.28. Obtain the expected value and variance of (1) Z=X+2Y, 
<2) T=X—Y, if (a) X and Y are independent, (6) X and Y are not indepea* 

dent, where E(X)= l * 1 ,E(Y) = t t 1 , Var(X) = 0 * and Var (Y) = o«. 

1 ' 1 
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x+Y a 5 nd 9 X-Y X and Y are “dependent obtain the covariano. between 

5.5. CHANGE OF VARIABLES 
In section 4*9 we discus^pH r . , 

ate case. Here we will consider the * n a u J ivari " 

ables m a muRivariate distribution. ?f thTstochaBthf variant 

In v“y thTtra^mXn t0 ^ St ° Chasti ° ™ iablea 


x s ,...x„) 

Y «=&(Xi. X 2 ,...X„) 
Y n = ^n(X l , X 2 ,...X n ) 


(5.38) 


den f Si ‘{ of Y - Y *- Y . is given by theorem 

5- without proof where <f> lt are functions of X lf X 2 ,...X„. 

defined” In' “ ^ X *~ X ’ a “ d Y » Y - Y » « - 
/l(^l, X 2 ,...X n )=f i [y 1 , y 2i ...y n ) | J | (5 39) 

tions of*X*x’ a * f *"xt n 'and^Y^Y* ,2/n v are the {? iu ^ density func- 
absolote value of thejacobian J where" Jie^hefoUowiJgUtermi! 


32/i 

32/2 

&/« 

0*1 

dx 1 *' 

3*1 

32/i 

^2/2 

32 /« 

3*2 

• 

• 

3*2 “ 

• 

3* 2 

• 

• 

32/i 

• 

32/2 

• 

• 

3 2/« 

3*n 

3*n " 

3*„ 


In this theorem the differentiability of Yt Y« v t_xa 
C ‘ and 80me general conditions on the variables are assumed.^ 


Ex. 5 5.1. Given the joint density function of X u and X 2 as 

f{ x i> * 2 ) = 2^1 e~ x * for 0<x x <2, x 2 >0 
—0 elsewhere 

the distribution o/Xi+X 2 . 

®°E Let us consider a transformation 


2 /i=*i +*2 and y 2 =x a 



r 


204 


PRODUCTION TO STATISTICAL MATHEMATICS 

1 0 




dy± Ml 

<)Xi d x i 


dxi 


d%2 


l 1 

-S/i 


=1 


My i. y^) « 5 5 

The region of integration is give„ l g 

yi= x i-r x * 


yz~ x % 




The density function f(yf) of Yi is 
Vi 

! —y t 

\ (2/i—2/a) e dy t 


-Vi 


0 

=\[ yi -l + e- yi ] 

For 2<yi<oo 
Vi 

f(yi)= | i (^i— 2 / 2 ) c v ‘ dy : 

2/1-2 

= Je -2/i (I + 6 «) 

Hence the required density function is 

( e' x -j-x— 1 for0<a;^2 

e' x (lfe 2 ) for2<a;<oo 

0 elsewhere. 

Ex. 5*5.2. Given a bivariate rectangular distribution as 

f(xi, x 2 ) = l for 0<xi<l and 0 <x 2 <2 
=0 elsewhere, 

find the distribution of Xi+X*. 


Multivariate distributions 

Sol. Let y 1 =z l -\-x% aJi= 2/2 

=> 

y^~ x i ^2=2/1—2/2 

The Jacobian of the transformation is 

j_ dVi_ 02/2 

dxi 0 a*! 

0?/i 0^2 

02-2 0#2 

•’* 2/ 2 ) =/i (a?i, rc 2 ) = $ 

/( 2 /i)= J /(yi, 2/2) %2= J l dy t 

V 2 y t 

The region of integration is given in Fig. 5‘6. 




20 & 


Fig. 6*6. 
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Vi 

/(yi)=j \ ^ a= (~ir) for 

o 

1 

= jl/2 d 2 / 2 = 1/2 for 1 < y x < 2 
0 
1 

= jl/2 dy i ={ll2){3—y 1 ) for 2 < y x < 3. 

Vi-2 

Hence the density function of X\ -\-x 2 is 

/(«) = f (1/2)(z) for 0 < x < 1 

J 1/2 for 1 < x < 2 

j (l/2)(3— x) for 2 < x < 3 

t 0 elsewhere. 

Ex. 5.5.3. Two independent stochastic variables X l and X 2 
have Gamma distributions with parameters a=m/2, (3=2 and a=nj2, 

fi = 2 respectively. Find the distribution °f ■ 

Sol. Since Xi and X 2 are independent, the joint density 
function of Xi and X 2 is f[x L , x 2 )=f 1 (xi).f 2 (x 2 ) where /i and / 2 are 
the density functions of Xi and X 2 respectively. 

1 -xJ2 _ 1 _. 

i.e. f{x lt x 2 ) = - 7 —r *i 2 e n/2 

2 r ( w / 2 ) 

4-i -*./* 


(w+wj/2 


(1 )r (;■) 


= 0 elsewhere 

Let us make the transformation 

»i/w y %ssXt 

J x 2 /n 

dyi 92/2 

j_ d*i d%i 


' f 

*2 


— (iCl + ^2 )/2 


for tri>0, t 2 >0 
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*• 9(yi, y 2 )=f(x 1) x 2 ) _i_ = 1 

; ! J I 


2 , "‘ +n), r(-^) r (-5-) 


m 


-1 


m n 

-1 1 


/ rn \ 2 2 - 2 

\ n j to 2/2) 2/2 




m 

2 


2,ro+1,/a r 


2/1 


tn 

2 “ 


1 r m n 

-2\_-n y ' y '+ y *\ r 

n 

J_\ 

' *( 

m 

2/2 J 




1 


s/ 1+1 

e~~2~ 

to 


-l 


when y(yi, y 2 ) is the joint density function of Y x and Y 


00 


/to)=J g{yi.y*)dy t 


0 


r( -i ‘ ) 


r (iK|-) 


(?) 


m 

«r 


m 


Va 


-1 


m-)-n 
2 


^<2/i<°o. 

Comments. This particular which is the ratio of two 
independent x 2 ’s divided by their corresponding degrees of free- 
doms is called an F-statistic and has an F-distribution given bv 
/to)- This distribution will be discussed in detail in the chapter 
on ‘Sampling from Normal Distributions’. 

Exercises 

5.30. If (X v X 3 ) has a bivariate normal distribution with the nn™ 
meters jtj, n 2 Cl , a, and p, obtain the distribution of (T 2 , T 2 ) where, 1 

TT 2 = (l —p 2 )"l/2[(X 1 — (i,)/cT 2 2 —(Xj— ^ 1 )/a{], 
f-Z^bhin^distributiSof distrib “ Hm Parameters „ /2 , 


Co 


Y=(»/m) ( ^ x ) . 

m Pare the distribution of Y with the F-distribution given in section 4.12. 
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e VI If X,, X....X- are independently and identically distributed as 
a Cauchy distribution (section 4.12) with 0 = 0, show that 
has a Caachy distribution. 

5.33. If X ls ..., X n ore independently and identically distributed as 

a N(fi, cr) obtain the distribution of the sample mean X =(X 1 + ...-pX)/ n by 
using the method of change of variables. 
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CHAPTER 0 


STOCHASTIC PROCESSES AND SAMPLING 

6.0. Introduction. A stochastic process, in general, can 
be defined as a collection of stochastic variables. In the last two 
chapters we *■ v » and studied some special probability 

distributions. Ihe study of a collection of stochastic variables hav- 
j D cr special properties is very useful in many branches of applied 
statistical mathematics. Most of the applications of this parti¬ 
cular study of a collection of o s is in the description and 
analysis of the development of a random quantity over time. 
Hence the name ‘as stochastic process’ is often identified with a 
collection of 8. v s X(£) where tf(^T and T is a set ot real numbers. 
In our discussion we will not be interested in a process over time, 
but we will simply consider a collection of s. v’s having some 
common properties. For a study of s.v’s over time the reader is 
advised to see any book on Stochastic Processes. 

61. SUMS OF STOCHASTIC VARIABLES 

In the statistical theory of distribution and testing 
hypotheses sums and linear combinations of s.v’s play an important 
role. So we will investigate some of the properties of sum of s.v’s 
before defining special collection of s.v’s such as a simple random 
sample etc. 

Theorem 6.1. If X 1} X 2 ..., X& are k independent s.v’s with 
M.G.F’s. M x< (t) for i=l, 2 ,..., k and if Y= X x -fX* then the 
M.G.F. of Y is the product of the M.G.F’s of X 1( X 2 ,..., X*. 

*’- e " M y (t) =M Xi (/) M Xz (£)...Mx 7c (t) 

= n m x< (t) (61) 

i = l 

where II is a notation for products. For example 

n 

H Cti = *••••• Ctfi 

i= 1 

Proof. M Y (£) =Ee* Y =E e *(X 1 +X 2 +...X 7c ) 

—E(e* Xl . e^ 2 ... e «X 7e j 

= E e <Xl . E e iX 2...E e* x * 

(by applying theorem 5.1 repeatedly) 
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=M Xl (0 . M Xl (0"-M Xfc (0 

= n m x ( t ). 

*■=.1 

Ex. 6.1.1. Examine the distribution of ...+r 

when (a) X< has a Binomial distribution with 'parameters N { and p 

for i=l[2, . k, (6) X t has a Poisson distribution with parameter 

^ for i=l, 2, . k, (c) X t has a Oamma distribution with para¬ 
meters cti and (3 for i=l, 2, . k, (d) X* has a normal distribution 

with parameters pt and for i=l, 2, ...k, where X lf X 2 ,„—, X k 
are independent. 

Sol. (a) M x< [t)=(q-\-p i=\, 2,...k 

Since the X’s are independent, by Theorem 5.2 

M y(0 = (2-|-2> . (q±p e <)N a ...{q-\-p e*)^ 

= (q-\-p e<) N i+ N *+**-+ N ifc (6.2) 

By the uniqueness property of M.G.F. Y has a Binomial 
distribution with parameters N=(N 1 -j-N a -h...and p. 

(6) M Xi (0 = e x i^- ] ) 

k 

' My (<)= n M x (t) =e( x i+ x *+-+ x *X << - 1 ) 

i=l * 

(by theorem 6.1) (6.3) 

* - Z has a Poisson distribution with the parameter 

(Ai +... 

(c) M x< (0=(l-p*)-“< 

m y( 0 = (1—P0 -ai ~“ a -“ft (by theorem 5.2) 

(6-4) 

Y has a Gamma distribution with parameters 
(ai-f-a 2 -f ...-f a 7c ) and p. 

( d ) M Xi (t) = e t n+*W'2 


k 

Myff)^ n e ^<+*V'2 

t=l 


• • * j 


■ ■ «w l 


-.».IX7Z'. 1 '■“ <«+»+'•+» 

ion equal to square root of ( a \- f- 
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or variance ai+- a 2 ~l"”' + o 2 . For any s.v’a Xi, X 2 ...Xr 
the probability function of Y=Xi+X 2 -}-... + X fc is also called the 

convolution of X^, X. 2i .,.~X. k . 

6.2. LINEAR COMBINATION OF STOCHASTIC VARIABLES 

Let Y=aiXi+a 2 X 2 -f ...-f a fc X fc 

where a>\, . a ic are constants. Y is called a linear combina¬ 

tion of the stochastic variable Xi, X 2t ...X k . 

Let E(X f )—ja, and Var(X < ) = o? for i= 1, 2,... 1c. 

E(Y) = EfajXx -f a 2 X 2 -1-.. • -j-ftfcX/J 

=aiE(Xj) -J- n a E(X 2 ) -}-... + n 7 cL(Xfc) 

(the proof of this step is left to the reader) 

= c hM'l4~ c *2P'2~t~. 

Jc 

— X a, Pi (6.7) 

i =l 

Var (Y)=E[Y—E(Y)] 2 

= E [aiXi +... + UfcX/s— ciifii —...—c* fc |U7 C ] 2 
=rE[ai(Xi— fii)-\-a 2 {X 2 — -J-a/^X^— p fc )] 2 

k 

= E[ 2 a (Xi—Hi ) 2 2 a { Uy(X<—p f )(X - —p. )] 
i=i * i.j 

i^j 

) e a 

— E a t E(X 1 -p < ) 2 -j- 2 a { a f . Cov. lX it X f ) 

i = \ i?£j 

1C 2 

=E a Var (X f )4- 2 o<t8y Cov(X*, X,) (6.8) 

i =1 * ipij 

This may also be written as 
Jc 

Var (Y)= E a 2 Var (X<)+2 2 a { a, Cov (X,, X 7 ) (6-9) 

t=l * i<j 

Note. When the X’s are independent the various covariances are 
zaroB and hence 

Jc 

Var (Y) = 2 a\ Var (X,). 
i=l 


( 6 - 10 ) 
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s a,a, Cov (X„ X,) means that the sum of all the terms 

i j^j £ 

where i^j, or except terms of the form 

cov (Xx, XO, o 2 a 2 cov (X* X 2 ), etc. 

COV (X f , X y ) — Pij ViVj 

(where p f/ denotes the correlation between X ( and X,) 
Var (Y) may also be written as 
k 

Var v Y)= Z a J + X p 4y 

i=l »W 

Ex. 6.2.1. Evaluate the expected value and standard deviation 

of X— ( X^-^-X^ -f-. -\-X n )jn where X\, X 2 ,...X n independent 

and E(Xi)=*ii and Var (X i ) = a 2 fori=l,2,... n (X is read as X bar). 

Sol. E(X ) = [(Xi + X 2 +...-}-^n)/ w ] 

= i [E(X0 +E(X S ) +.. .+E(X„)] 
u 

= 4~ [ l i-\-p-\-...-\-p]=n t i/n^ f j. (6-11) 

70 


Var (X)= . S Var(X<) + 0 

=^-[° 2 + a2 + ”- + a2 J 

Ha 2 <7* 

n 2 ~~~ n 

The standard deviation of x =ojy/n. 


( 6 - 12 ) 


Exercises 


6.1. IfX 1 X 2 .X n are independently and identically distributed 

point Binomial variates show that Y=X 1 +...+X„ is a Binomial variate. If 
X is & s.v. talcing only values 1 and 0 with probabilities p and 1— p respec¬ 
tively then X is called a point Binomial variate. Show that 

X— np 
y/np{L— pj 

where n-> <x>. 


[Hint. Use the central limit theorem.] 


exactly^sucSsK giv^nby ^ Sh ° W that the P robabilit 7 ° f gening 


/(*) = 


V x i\—p) l ~ x for a,’=0, 1 
0 elsewhere. 
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6.3. Three independent «. v’g X lf X, and X, have (1) binomial dis¬ 
tributions with parameters p 1= p 2=i > a=1 y 2 , N x =10, N,=16, N 8 -30 ; ( 2 ) 
Sion distributions with parameters X x =X 2 =3 and X 3 ~5, obtain the 
Retribution of X^Xj+Xj. 

6.4. two independent 8. v’s X x and X 2 have the same rectangular 
Retributions, obtain the distribution of (X x +X,)/2. 

6.5. If n independent s.u’s have gamma distributions with the 

ame ters ••• == Pn—3, a x =a, a 2 =2a,..., a n =na, obtain the distribution 

of the sum of the s.u’s. 

6.6. IfX x , Xj, X, are independently and normally distributed as 
T , \ N(h- 2 , cr s ) and N(pi 3 , o s ) respectively, obtain the distribution of 
(a) X 1 -2X i +X 3 , (6) X x+ X 2 -5X 3 . 

6.7. If X is the number of successes in N independent trials, where 
the probability of success in the tth trial is p^ obtain E(X) and Var (X)* 

6.3. SAMPLES FROM THEORETICAL POPULATIONS 


In chapter 1 we defined a statistical population as a set 
where every element may be characterized by one or more 
characteristics. If the population is univariate then every element 
is characterized by one characteristic. For example the height 
measurements of all the students in a particular country at a 
particular time. This is an example of a univariate finite popula¬ 
tion. Let us consider a set where the elements are defined by a 
probability function f(x) or by a stochastic variable X, in the 
sense that the probability of getting a particular element x Q of the 
set is given by f(x o) if X is a discrete variable and the probability 
of getting an element in the neighbourhood of Xq is given by 


*•4 


Ar 0 

2 



I J(x)dx=f(x 0 )dcf. 

" T 


o 


approximately, where /\x 0 denotes a small neighbourhood at x 0 if 
X is a continuous stochastic variable. If a set is thus defined by a 
stochastic variable X or by a probability law f(x) we say that such 
a set is a theoretical population designated by the stochastic 
variable X. This is a univariate case. The same ideas may be 
generalized for defining a multivariate theoretical population. 
* or example a normal population means a set designated by a 
formal variable X or by the normal probability law 


/(*) = 


ere p, and a are parameters. If a; 0 is called an observation 
^ a .population/(*), this means that x 0 is an element of the 
designated by the stochastic variable X having the probability 


g\Z2tz 


exp 
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function/(*)• ^J^^aTing n a^pTOb»*biM^^unction/(j:” me jf 

^and^tre two independent observations from /<*) this may be 
considered to be the values assumed by two independent etooh- 
astic variables X 1 and X„ each having the same distribution given 
by the probability function/(a). 

Now we will define a simple random sample from a theore¬ 
tical population designated by a probability function f(x). A set 

of stochastic variables Xi, X 2 ,.**X„ which are independently and 
identically distributed as f(%), is called a simple random sample of 
size n from the population/(a;). For example if Xj, X 2 ,...X„ are 
independently and identically distributed as a normal distribution 
with parameters p and a then Xi, X 2 ,--«> X n is called a simple 
random sample from a N(/x, cr). IfX a ...X n is a random sample 
from f(x) then their joint probability function, 

f(Xi, Xz,...X n )=f(X i) . 

[Here /(ar<) means f(x) at x=X{, for i= 1, 2,...n). 


n 


if f(x)=(2no 2 )~ 1 l 2 e ~ {x ~^l 2G \ 

(Since Xi,..., X n are independently and identically distributed). 

6.31. Statistics. If X u ..., X n is a simple random sample 
from a population f(x) [i.e., X lf ..., X M are independently and 
identically distributed as/(X)J then any single valued function of 

Xi.X* is called Statistic. (Plural of statistic is statistics and 

is different from the science Statistics or a collection of obser¬ 
vations.) This definition can be extended for a multivariate case 
also. For example, 

x =(Xi + ...-f X n )/n is statistic ; 

S 2 = Z (Xi—X) 2 /n is a statistic etc. 

t=l 

tzm rie nit F^ b f1 r ° f s * atist . ics oan be constructed from a given 

SfeSSSSr* - ±: 

measurabie functions of® lscuss| ons we will consider only Borel 
definition i f/om the 

tion TT™ diStrib " ti0 l n of a statistic is called a sampling distribu¬ 
te distribution o/the d krgraToT X ^ “y ““ pIing distribution. 
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estimators. Estimators and their properties will be discussed in 
the v chapter on estimation. 

If we have a sample from a multi*variate population we can 
construct a statistic for this sample in a similar way as in a uni¬ 
variate case. For example let (X x , Y x ), (X,, Y 2 ),...(X m Y n ) be a 
random sample from a bivariate population [this means that the 
random variables (X i} Y<) have a joint distribution f(x, y) for 
i=l, 2 ,...n and (X x , Y x ), (X 2 , Y 2 )...(X n , Y„) are all independently 
and identically distributed as f{x, y )] then the sample covariance 

n 

Sia= X (X x —X )(Y|—Y )/n is a statistic. 

t=l 

The sample correlation coefficient 
fi 2 = - 0 — 0 — is a statistic 


n n 

where Sj= 2^ (X x —X) 2 /^ an d (Y { —Y) 2 fn are the sample 

variances. A number of statistics may be constructed from a 
sample from a multi-variate population. 

6»32. The Sample Mean. If X x , X 2 ,...X n is a random 
sample from a population f(x) then 

X=(Xi4~... +X n )/n 

is called the sample mean. Sample mean is evidently a Statistic. 
Let us evaluate the expected value and variance of the sample 
mean X 

E(X)=/x (6i3) 

and Var (X) = a 2 /n, (See Ex. 5-6-1.) (814) 

Therefore, X has the mean n and the standard deviation 
where n is the sample size. For example if a random sample 
of size n is taken from an exponential distribution with parameter 
B then the sample mean has the expected value 6 and a variance 
B 2 /n since the mean and variance of a stochastic variable having 
an exponential distribution, are 9 and 6 2 respectively. 


6-33. The Sample Variance. If X x , X 2 ,...X n is a random 
sample from a population with mean p and variance c 2 then the 
sample variance S 2 is defined as 

n n 

S*= S (X,-X) a /»= 2 X, 2 _(X) a 

»=1 t=l—~ 


( 615 ) 
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[ where X is the sample mean. Let us evaluate the expected v 
f of the sample variance a W 

n n 

E(S*)=E £ (Xi—X) 2 /n=E E^ (Xf-p+^x^ 


=E E (X i -f J .) 2 +n(X- H .) 2 -2E i (X i - f i)(X- /l ) 

L *=l 

= 2 E(X ( -p) 2 -w E(X-p) 2 l/n 

L M J 

But E(X i )= JUj E(X)=ju and Var (X) = a 2 /n 

E(S 3 )= [n CF 2 W.(T 2 /w]= -—^-CT 2 

n n 

E J? L X <~Xf = Jn- 1) 

t = l n n ' ’ 

E 2 , ( X <~ X > 2 = a 2 = E r •_ n g , 1 

i=i »-i L(»-i ) 0 J - 


In 


( 6 16 ) 


(6-17) 


( 6 - 18 ) 


* Here we s ®y tlj at nS 2 /(w-l) is an unbiased statistic for o 2 
“ tbe sense that E[»8*/(»-l)] is a*. This aspect of unbiasedness' 
of statistics w'U be discussed in the chapter on point estimation 
Because of this property some authors define the sample variance 

aS fil But we will follow the definition in section 

6-33. 


of size 4 frnn^n ^ ha } f' ^ ® 5 an observed random sample 

fampL /:Znce P P ° nf(Xl lU “"** *«*» S* 

/te). This means thlTfg™ I’arc^he* 1 ™ I™™ the P 0 ^ 8,1 '? 0 
stochastic variables X, x' v ' * e values assumed by the 

X 2 , X 3 and xlTrp X J ’ a , nd X * respectively, where X„ 

as f( x )‘ pen ently and identically distributed 


The sample mean X =(X 1+ X s+ ... + x„)/„. 
given by (54-8 +t+3y/ > E*^ va ^ ue oP *' pls stochastic variable X i» 

sample ^t^ac^ortVghmstmlkis 16 “ 4 eimilarIy ^ 



ft 
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=[(5-5) 2 + (8-5) 2 +(4-5) 2 -l-(3-5) 2 ]/4 

=*14/4. 

rpbis may he taken as a value assumed by the stochastic 
iabl© S 2 . 


fi-34. The Standard Error. From a random sample, wo 
nstruct an infinite number of statistics. The standard 
c»°. GG - n (gquare root of the variance) of a statistic is known as 
de^ a 1 jjJrd error of the statistic. For example in Section 6 32 


th6 Sta seen that the statistic X, (sample mean), has a standard 

TT _ j 1 _ i 1 1 /* i 1 i • 


Uny0 - - 

tion alV n - Hence the standard error of the sample mean 
^ V /n where a is the population standard deviation and n is 

Ex. 6*34*1. Show that >X~Y has a standard error 
( ollni+ol/n 2 ^ 

w here 2 is the sample mean of a sample of size n x from a population 
with mean Pi and variance and Y is the sample mean of a sample 

of size n 2 from a population with mean /x 2 a.nd variance a 2 , where 
the populations are assumed to he independent. 

Sol. Let T=X—Y 

E(T)=E(X"-Y)=E(X)-E(Y) 

=pi~t 1 2 (619) 

Var. (T) = E[T—E(T)] 2 

= E [ X — Y ~—(/ xj — p 2 )]2 

=e[(x-m,)-(V-/*,)]* 

vanishpq • 4 -u — /*i) 2 +E(Y — /x 2 ) 2 (the covariance term 

ce the populations are given to be independent) 


is 
the 


— °i /ni -fa 2 /n 2 


( 6 * 20 ) 


*** The standard deviation of T is ( / ni +a| fv 2 ) 


i/2 


Com] 


^epeaden 1 ! 161148 *! —A4 and <ji = ct 2 , i-t-, if we take twn 

then X v , Samples of sizes n i and ^2 from the same population 
has the standard error 

(o a /»!+o 2 /n a )i rt=<J( l /m+1 /»,)! /> 


(6-21> 
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If two stochastic variables X and Y are independent, we say 

that the populations represented by X and Y are independent 

(statements like two independent exponential populations with 
parameters 6 \ and 82 , two independent normal populations with 
parameters px and cti and p 2 and a 2 etc.). 

Ex. 6*34*2. If X and Y denote the proportion of successes 
{total number of successes divided by the total number of trials ) in 
two independent Binomial probability situations with parameters 
Pi and Ni and p 2 and N 2 respectively, find the expected value and the 
standard error of the statistic T—X—Y. 

Sol. If X denotes the number of successes in a Binomial 
situation with parameters p and N then the proportion of success 
Z=X/N has an expected value p and a variance ^>(1— p)j N. This 
is easily seen, since Z=X/N=(1/N). X and 1/N is a constant 
Hence E(Z) = (l/N).E(X) = (l/N).Np=.p and Var (Z) = (l/N) 2 Var 
(X) = (1/N) J N.j9(l— p) =p(l~p)/N 

T=X-Y 


E(T)=E(X)-E l Y)=^ 1 — p 2 (6*22) 

Var (T)=Var (X) -fVar (Y) (Since the Binomial popula¬ 
tions are given to be independent). 

=Pi(l-Pi)/N 1 +p2(l—p 2 )/N 2 (6-23) 

The standard error of T is 

[px{\~pi)!Nx +p 2 (l -p 2 )[Htf 7 2 


In example 6-M we have seen that if X lf X 2 ,...X„ are inde- 
pendent and if X, : N(p ( , a^fori^l, 2,...» then Y=X!+X a + ...X B 
has a normal distribution with parameters, mean=:/i 1 4-p 2 + ...+fi n 


and variance =(7 2 4-cr 2 4 -... 4-<j 8 

1 


As a special case of this, if 
Xi, X 2 ,...x n is a random sample from N(/^, <7) then 


X_ (X!-f-X 2 + ... -f-X n )/n 

has anormal distribution with mean p and with variance „»/n 
X : N(fi, a/Vu) ilX 1 : ...X„ is a random sample from N(p, a). 
Hence 


•z, X E(X) 


(X-ju) 

afy/n 


^Var (X) 

has a normal distribution with zero mean and unit variance 
% - e,> Z : N(0, 1) 

<t) then^e^tanda ™ n ^? m sa mple of size n is taken from 
) inen the standardized sample mean has a standardized 


3 


. 


i 

f 

1 




i 


1 


1 

S 

I 
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"Tetmine^? e 8am P le size » may be. Now we 
^ nHom samnlp nf ■ u ^ 1 ? n °f the standardized sample mean if 

; r d of m the 1 ; ta u ken from an y population /(*). One 

nflardized sanml lmi « ^ eoretn gives the distribution of the 
bandar d sample mean for any parent distribution. 

6.35. The Central Limit Theorem. 

Theorem 6-2. If X, Y v • , . - 

,^, 10 +;™ Trrifi, « -i. *’ is a random sample from a 

population /(*) with finite mean and variance then 

Z = [X E(X)] 4 Var 

has a distribution which approximates to a standardized normal 
distribution when the sample size is sufficiently large 

i.e., If Z= (X-P) 

cr/y 'n 

then z : N(0, 1) when n-+oo (6-24) 

where f*=E(Xi) and a 2 = Var (X,) for i = 1, 2 „..n. 

Proof. Let M x (£) be the M.G.E. of the parent population. 
or (0=M X (0 for i = l, 2 , ...n 

^ et Y=(X 1 -fX 2 -i-...-}-X n ), then 

My (0=[Mx(^)] n (Theorem 6.1) (6 25) 

t 

M Z (!)=*--“ [M x (-±_)] 

log M z (i) = — [x-^/n tfa-\-n log M x ^ 

(where ‘log 5 denotes the natural logarithm). 

But ;VT J-L-) = l + ^ ( J— ) . ^j( J\ 
x \a\/n) 1 \a\/n J + 2 !\a\7n ) 

where =/x, /*' 2 .... are the raw moments of X 

= 1+T (say) (6‘27) 

log (1 -fT)=T—T 2 /2+T 3 /3 —... 

(assuming the validity of the expansion) (6*28) 


in 


cry f n 


) 


(6-26) 


+ 


logM z (<) = -/! ^nt/o+n T-^- + ^- + ... J 


where 


T=pi (tlo\/n)-{- ^.{tjcy/n ) 2 + ... (6-29) 

Now collecting the coefficients of t, t 2 ,... in log M z (/) we get 
log M z (0 = (—ja\A4-j Lt iV™)(*/ 0 ) + 
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Since the coefficient of t/a is zero, and the coeffi c i ent 

of (t/o) 3 , (f/a) 4 ,... contain powers ofwm the denominators. 

__ _ mm m 1 a i-It yv n d~\ A/> 1 DTI f. nf ^/m3 contain a/^ in the denomiu^ r 


etc. 

Hence, when n-> oo. 




logM z (<)-*J(M'a- Pi' 2 )(</5) 2 = 


(6-31) 

t.e. 

M z =e‘ 2 / 1 



when 

n-*oo> 


(6'32> 


By the uniqueness property of moment generating functions, 
Z is N v 0, 3) when w-»oo since N(0, 1) has a M.G.F. = e* ^ . This 
completes the proof. 


The reader might have already realized the importance of 
this theorem. According to the theorem if Xi, X 8 ,...X„ is a random 
sample from an exponential population with parameter 6 , (sec. 
4-12), then 

7= X —E(X) __ (X-fl) 

\/Var (X) ®\\/ n 

is approximately N(0, 1) when n is sufficiently large. A good 
approximation when the parent population is symmetric is usually 
obtained when 30. For a detailed discussion of the various 
central limit theorems and other theorems on convergence of 
stochastic variables see M. Loeve, Probability Theory, D. Van 
Nostrand Co, Inc, New York, 1955. 


Exercises 

f Obtain the distribution of X when the sample of size n is 

andN* ^ ^ ^ and H haS a binomial distribution with the parameters p 

is a random sample from a N(|x, a) and n has a 

v“ r 0 afwhir! r y Ut x n a. Wi ^v th0 ; P ara “ e , ter9 P and N obtain (1) E (Y), (*> 
1 ; wnere Xc»x 1 + ***-fX rt . (see problem 5.15). 

distribution ofX^'/xY^ 1S & randoin 8am P le from a N(n, a) obtain the 

1 a 2T -^“3* 

sample from a biv^Lrifl+i^ (®» (5, 8) is an observed random 

ances and the sample cor^lL'ion^ffic^nt^ 6 meanS ' SamplG ^ 

ate population otoab^+Un ^ is a ran dom sample of size 2 from a bivari 

* (1) (2 > th& 

three independent nomdnHr, m fla V 1 P^ 6s sizes n,, tt 2 n. are taken 

populations with means \„’d with variances 
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<jJ, c \ Bn< ^ respectively, what is the standard error of the statistics 
£^Y+2Z where X, Y and Z denote the sample means. 

614. If two independent samples of sizes and n 2 are talceft from 
xj^e sftnio popula/tion with mean |x and variance a 2 , what is the standard error 
the statistic X 2Y, where X and Y denote the sample means. 

6.15. Two random samples of sizes 10 and 20 are taken from a popu¬ 
lation with variance a* = 16. By using Chebyshev’s theorem obtain the limit 
for the probability that the sample means will not differ by more than 3 
units* 

6.16. Two independent random samples of sizes 20 and 30 are taken 
from two populations with the parameters |x 1= fi 2 and c 1 = 10 and 
Using Chebyshev’s theorem what can we assert with a probability of at 
least 0.80, about the possible difference of the sample means. 

6.17. Ten per cent of the oranges in a grocery store are spoiled. A 
housewife picks up 20 oranges at random. Obtain ah upper bound for the 
probability that the proportion of spoiled oranges inphis sample differs from 
the true proportion by more than 1%. 

6.18. A random sample of size 50, taken from a population with 
parameters jx=20 and a=4, has a sample mean 25. If random samples of 
size 50 are taken, what is the probability of getting a sample mean as large 
as 25. 

6.19. Two independent random samples of size 100 each are taken 
from two populations with the parameters ix x =20, |i,=20, <t 1= 5, a a =6. If 
sampling is continued, what is the probability of getting two sample means 
which differ at the most by 5. 

6.4. ORDER STATISTICS 

We will develop the theory for the case when the parent 
population is continuous. The ideas may he extended to a dis¬ 
crete population as well. Let X\, a; 2l ... x n be an observed random 
sample from a continuous population /(as). These observations may 
be ordered according to the order of their magnitudes. Let 

be the arrangement of x lt x if ...x n ; where u\ is the 
smallest one, u n is the largest one and u r is the r th largest one. 
u \> Uz, ...,u n may be considered to he the values assumed by the 
stochastic variables Ui, U 2) ...U„. For example if we take a 
number of random samples of size n and order the observations 
according to their magnitudes in each sample, we get a number of 
values assumed by Ui, U 2l ...XJ n . The stochastic variable U T is 
called the r th order statistic and the distribution of U r is called the 
sampling distribution .of the r tn order statistic. u r is the r th largest 
of the x’s and hence (r—1) of the a*s fall below u r and n — r of the 
x s fall above u r . For convenience we assume that there is only 
one observation which is the r ih largest. 

Let p lt p 2f and p 3 he the probabilities that an observed value 
t the stochastic variable X falls below u r , in the interval 
(^r» u r -\-h), and in the interval ( u r -\-h , oo) respectively, where h is 
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a very small quantity or (u r , is a ver y smaI1 in terval at 

(see Fig. 6.1). 

r-i j n ~ r 

h 


u r 

Fig. 6.1. 


_< — 

llr + h 


u r 

Pi=j f(x',dx=F(u r ) 


(6.33> 




P 2 =J f(x)dx=f(rj) . h 


where 




(6.34) 

(by the mean value theorem) 


00 

^3=J f(x)dx=l—F(u r -\-h) 


(6.35). 


5?™ ll 0 ’ P '^3% h where/(«,) is the density function 

/(*) at *-« r (not the density function of U,) and = 1 

We have observed r—1 of the 3*’s A ■ r ( w rb 

d^ytnt- ^ - ^uitTnoS d^h^ 

a(U~\= - 71 * T1 r -l -1 ni n-r 

’ F~i) ! 1 ! (n-r )! p i p t P 3 


u r 

_ n ! pf -]r-i 

— (r—I) !>_»•) (J J /(«.)• 


00 

[J /(*)<?* J" 


(6-36) 


where gr(w f ) is the density function of TT */„ s • ,, v 

For example the density functions of +£ ls /(*) at x=u r . 

u, and the largest order statistic U„ Le glyeTas 


9( u i)=g{u r ) 


for r ~ l= ( ^r|yr/(«i)[| /(*)<faJ "' 1 
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00 


“»• /(“>) Q f(x)dx j"" 1 for - 


°0<^1<00 T 


(6*37) 


u 


u 


n 


g[u n )— n ' /( w n) . f(x)dx for— 


oo<w n < 00 . 


(6 38) 


- 00 


* 1 . ^5 distribution of U r and U* where r=£s. 

T ^VmTv hp nLt Utl °^ ° f tW ° ordered statistics U, and U, where 
r^3,may be obtained m a similar fashion. This is illustrated in 
Fig. 0.2. 


r-i 


s-r-i 


ri-3 


u r u r +h, u s u s -th 2 

Fig. 6-2. 

The joint density function of U r and U s is 

u, 

m f r 

g{u r , U,)= 


oc 


_n ! 

(r—1) ! (s —r 


-1)1 (»-»)! J fW dx _ 


00 


U. 


00 


/K)[j /(*)<** J r /(«,) . jj f(x)dx j'* 

u r 


for — oo<w r <w s <oo 


(6-39) 


The sampling distributions of the median and the range are 
obtained from equations (6*36) and (6-39) respectively. For 
example if we have 2m+1 observations the (m+l) lh ordered 
observation is the median of the observations and hence the dis¬ 
tribution of the median M is obtained by replacing n by 2 m 4 -l 
and r by m+1 inequation (6*36). Un-U! is the sample ran^e 
and hence the distribution of the range R may be obtained from 
he joint distribution of U n and Ui. Sometimes it may be difficult 
to obtain the exact values of the integrals in the equations In 
such cases we may obtain approximate values by some approxi¬ 
mation procedures. * ^ 

Ex. 6.4.1. Obtain the distribution of the smallest order statistic 
1 J rom a n exponential population with the parameter 9. 


Sol. 


fi x ) = e x / d for 0< a; < 00 . 


9 

= 0 elsewhere 
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f f(x)dx=e -“ 1/ 9 (6-40) 

\ The density function for the smallest order statistic 
Uj is** 

CO 

g(ui)=n . /M Q /(*) ^ J 

Ml 

=w.i- e Wl /® [e Mj/Gjn-i 

=-^- e —( n / 0 ) M i for 0<wi<oo 

= 0 elsewhere. (6 41) 

Comments. The distribution of Ui is again an exponential 
distribution with the parameter 1/0 multiplied by the sample size 
n. This is a characteristic property of some distributions. A 
family or a set of distributions may be characterized by using this 
property. 

Exercises 

6 . 20 . Obtain the distribution of the smallest and the largest order 
statistics from a sample of size n if the parent population is, 

f(x, d)= fl/O—a) f*r «<»<3, a>0 
\0 elsewhere. 

6.21. For the distribution in problem 6-20 obtain the distribution of 
the sample range (U n —by obtaining the joint distribution of U r and U r 

6.22. Obtain the distribution of the sample median if a random 
sample of size 2n- j-1 is taken from a normal population. 

6.23. For large n it can be proved that the distribution of the sampel 
median m of a sample of size 2n -{-1 is approximately normal with mean the 
population median M and with variance l/8»i[/(M)] 2 where /(M) is the popula¬ 
tion density function f[x) at *-M. Obtain the variance of m if fix) is a 

a), and if the sample size is large. 

6.24. Obtain the distribution of the sample range R if samples of size 
w are taken from an exponential population. Compare the distribution with 
the parent distribution when n =2. 

6.5. SAMPLING FROM A FINITE POPULATION 

The following is a note on sampling from a finite population 
Tor a detailed discussion see any book on Sampling. Definitions 
and examples of univariate as well as multivariate populations, 
are given in chapter 1 and in section 6.3. Let us consider a finite 
univariate population given by a set of observations or data. For 
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example, the annual incomes of all the citizens in a city in a 
particular year, e height measurements of all the students in the 
universities in a particular country at a particular time, the set of 
bullets produced by a machine in a certain time interval etc., all 
form univariate finite populations. If we are interested in study¬ 
ing some population characteristics such as the mean /*, a measure 
of dispersion (say the standard deviation) etc., it may not be 
possible to examine every element in the population. For 
example if we want to know about the average life time of electric 
bulbs produced by a particular process, if we test every element 
in this population, there will not be any electric bulbs left for 
sale. It may not be economical to conduct a survey of all citizens 
in India to find out the average income of the citizens. The time, 
money and numerous other factors involved, may compel us to 
adopt some other procedure in order to study the population 
characteristics of a given finite population. The method usually 
used is to take a subset (sample) from the given set (population) 
in such a way that the inferences or results in the subset can be 
generalized to the population. This is a process of induction. If 
our inductive inference is to be valid the subset (sample) should be 
a representative of the population in some sense. 

If a sample is selected in such a way that every element in 
the population is given equal chances of being included in the 
sample, then such a sample is called a simple random sample from 
a finite population. However such a restriction on the selection is 
not necessary for the application of probability theory. For a 
detailed discussion see bibliography [5] at the end of this chapter. 
A simple random sample may be selected from a given population 
by using a table of random numbers. In such a table a set of 
numbers are given where the numbers from 0 to 9 occur with 
approximately equal proportions, the numbers from 0 to 99 occur 
with approximately equal proportions etc. We can number the 
elements in a given population of size N, from 1 to N and a simple 
random sample of size n can be selected by using the table of 
random numbers such that every element in the population is 
given an equal chance of being included in the sample. Such a 
sample may be considered to be a representative sample and the 
population characteristics may be estimated by some functions of 
the sample observations. For example the population mean may 
be estimated by the sample mean. The properties of such esti¬ 
mates will be discussed in the chapter on estimation. A random 
sample may be selected from a given finite population by using 
a random experiment. For example the elements in the popula¬ 
tion can be numbered from 1 to N. These numbers can be written 
<m cards and cards are drawn one by one with replacement from 
the well shuffled deck of these N cards. Thus a sample of any 
required size may be obtained. A detailed discussion on the cons- 
ruction of random numbers see the bibliography [51, at the end 
OI this chapter. 
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In some cases the given population may be divided i Qto 
different strata and then a simple random sample is taken. Such 
sampling procedures are called stratified sampling. For example 
in oder to study the average annual expenses of the families in ^ 
city the families may be divided into different income groups and 
simple random samples may be taken from each stratum. Some- 
times sampling is done in different stages. Such sampling p ro , 
cedures are called multistage sampling. In order to study the 
attack of a particular disease in a country, a few administrative 
districts may be selected at random and from these districts some 
villages may be selected at random and survey may be conducted 
in these villages. Instead of taking a random sample of pre-assign- 
ed size sometimes a sequential sampling procedure is adopted 
Sampling is continued or stopped based on the information 
obtained at egery step. For a detailed discussion of these sampl. 
ing procedures the reader is advised to read books on sampling. 


Exercises 

6.25. If an ordered set of n elements are taken at random without 
replacement from a set of N elements, show that the probability fix-, ... x \ of 
getting n elements is, n> 


ar„)=l/N(N—1)...(N—n+1) 

(If all such sets of n elements have the same probability we say that 
Xi,..., x n is a random sample from a finite population with N elements.) 

-. ., 6 \ 26 ‘ ^ nd « r the assumptions in problem 6-25 show that the marginal 
distributions, (1) f(x t ) — 1/N for £i =a lt where a lt ...a N are the ele¬ 

ments in the population, (2) f(x i ,x j )-l/N(N- 1) where fl Xi , xA denotes the 
joint probability function of X f and X,- for some i and j, 

6-27. If x is the mean of a random sample of size n selected from & 
finite population of size N, with mean and with variance c 2 , show that 

N-n 


(1) E(X) = jt, (2) Var (X)= 

' n 


N-l 


n 6 ' 28 ; J fa randomssample of size n is taken with replacements from 
finite population of size N with mean fj, and with variance a 2 , obtain 

(1) E(X), (2) Var (X) 


where 


X denotes the sample mean. 


• 6 oft 9 o n ? y S !£ g a of random numbers select 10 samples each c 
' 5 >q 9 fiTn ^- f ; T« a and obtain the sample means. 25,2' 

it' 3o' S)’ tl’ 9Q* l*’ li' on oo' % 2 o' 31 ’ 35 ' 36 ’ 34 ’ 32 ' 39 > 3i > 27 > 2S > 31 

3l' 28* 27’ 26' 2Q ll' S’ fe’J 5 ’ 36 ' 34 ' 32< 33 ' 3l ' 32 ’ 3f) > 3 ^ 3t ’ 2£ 

38 32 20 22 fl’ 2 ?A 2 0 2, 23 ’ H > 20 ’ 26 ‘ 28 ’ 29 - 34, 36, 3^ 

6 , 62, 20 , 22, 21, 24, *-3, 25, 27, 39, 40, 38, 37. 34 30 29 40 23 99 9% 94 21 

20, 28, 29, 35, 36, 38, 37, 34, 36, 25, 27, 26, 23, 24, 21. ' ’ * 2 ' ' ’ 

,K <ni 8 qn ■ a PPj'O x i mat © distributions of the sample means of size 

Lmnl'o 2 m^ ,r V Pr +i lera 6 ' 29, by formin g a frequency table of the observe 
samegraph f ° r ^ Vanous sam P le sizes. Draw the three curves on th 
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6.31. Draw a normal curve N(ti, o) with the parameter n=the arith- 
me tic mean °* V 1 ® sa ? 1 P^ e means and a a =the variance of the sample means, 
of size 30 m problem 0.29. Draw this normal curve and the approximate 
distribution of the sample mean of size 30 obtained in problem 6.29, on the 
same graph. 

6*32. By chosing 5 intervals of equal lengths, form a frequency table 
for the data in problem 6.29. Represent the frequencies by a histogram. 
Draw a smooth curve which is most appropriate for this histogram. 


6.6. ACCEPTANCE SAMPLING 

In an industrial production process where a particular article 
is produced in a large scale or in large number in a short interval 
of time, it is not practicable to examine each and every item in 
order to control the quality of the product. In such a situation 
production engineers use quality control methods and a brief 
account of it is given in chapter 8. In a mass production process, 
even if an item is produced by using the best equipments avail¬ 
able, there will be some variations from item to item. If a parti¬ 
cular part of a precision equipment is produced in a large scale, 
even the slightest departure from the specifications may make the 
part useless as far the consumer is concerned. In such large 
scale productions the items are shipped to the customer in lots of 
sizes, may be of thousands. 


Even if a good quality control method is used, some defec¬ 
tive items may be included in the lots. Examination of each and 
every item in a lot may cost more than the cost of production and 
if the examination can be done only by destroying the item, 
examination is not practicable also. If the manufacturer cannot 
eliminate all defectives from every lot he would like to reduce the 
number of defectives to a minimum and also he would like to find 
out the number of defectives in a lot by examining the smallest 
number of items possible. As a criterion he can select a random 
sample of size n from a lot of size N and examine the n items. If 
more than c of them are defective he can stop the shipment of the 
lot or can examine all the N items and replace the defectives by 
good ones. This sampling plan is based on a single sample or 
only one sample is taken. Such sampling plans are called single 
sampling plans and plans based on more than one sample are 
called multiple sampling plans. 


6.61. The Operating Characteristic Curve. If the samp¬ 
ling plan is to accept a lot of size N if the number of defec¬ 
tives in a random sample of size n is less than or equal to c then 
the probability of acceptance is 
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where 8 denotes the fraction defectives in the lot. » can be 0 or 
1/N 2/N . . N/N : in other words there may be 0 or 1 or 2 or ...jJ 
' ' ' ’ ' defectives in the lot of size ^ 

hm p( 6 ) varies according to 9 and 

P[ J if p{9) is plotted against e w e 

OC-Curve get a curve called the operat- 

1 ing characteristic curve (00- 

n. curve) of the single sampling 

\ plan as shown in Fig. 6*1. 

When N is large the 

_ hypergeometric probability p(0) 

~~0 //.. p/., - ’aJZ 7 can be approximated by a Bi. 

° /N /N _ m normal probability (Ex. 4.3.1) 

v and hence p( 6 ) can be approxi- 

g mated to. 


Fig. 6-1 


p(d)^ =0 ( *)«•(!-«)"-« 


Further when n->oo, d ->0 but n0 ->A (a constant) then p( 9 ) 
«an be approximated to a Poisson probability (see section 4.24) as, 


p{d) 


— \/x ! 


x =0 x\ 


6.62. Producer’s and Consumer’s Risks. Suppose that 
the producer (seller) and consumer (buyer) agree to call a lot 
acceptable if the fraction defective 0<0 O and unacceptable if 
and also that they agree on the same sampling inspection proce¬ 
dure. Here 0 O and 0j are known as the acceptable quality level 
(AQL) and lot tolerance percent defective (LTPD) respectively. 
Lots with fraction defective 0 is such that 0 o <0<0i may be called 
lots of indifferent quality. In this sampling inspection procedure 
it is possible that the producer may scrap a lot when it is really 
acceptable and the consumer may accept a lot when it is really 
unacceptable or where *>*,. The probability, a, that a good lot 
(0<0 O ) is rejected is called the producer’s risk and the probability, 
(3 that a bad lot (0>0i) is accepted is called the consumer’s risk. 
These probabilities are also known as the Type I and Type II 
errors respectively, in the theory of testing of hypotheses. If the 
seller and buyer agree on the values of a. p, $„ and then at least 

in the large lot eases (N-large) a sampling plan can be fixed (» 
and c can be fixed). v 

.• EX o‘A 61-1 ,; In a single sam P Un 9 plan, calling for a sample of 

7.1 2 1’ ia, thl! a . cce P ta n^ number c=l. Assuming that the lot 
a binomial approximation is appropriate, calculate 

the probabilities of accepting a lot when there are 10 V defectives and 

rejecting a lot when there are only 5% defectives. ° 
defective^ 10% P is? babll ' ty ° f acceptm S a lot wllen the percentage 
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^ ( 0 * 1 )® ( 0 - 9) 20 a: 

= (0‘9) 20 -f 20(0-l)(0-9) 19 
= (2-9)(0'9) 19 . 

The probability of rejecting a lot when the percentage defec¬ 
tive is 5% is, v 



2 20 l 20 \ 
x=2\ x ) 


(0-05)* (O^) 20 -* 


= 1-2 


l 

£C=0 


^ (0-05)* (095) 20 "* 


= l-(l-95)(0-95) 19 . 


Comments. If the incoming quality of the lot is 6 and if 
the defectives in an unacceptable lot are replaced by the non¬ 
defectives before the shipment and if p(a) is the probability of 
accepting a lot then 8 p{8) can be defined as the average outgoing 
quality (AOQ). p(b) is the probability of accepting a lot and such 
lots contain proportion 8 of defectives. I — p(d) is the proportion 
of lots rejected in the long run. 

Hence AOQ = 0 p(6) -j-0[l — p[9)] =6 p{6). 

Exercises 

6.33. A single sampling plan calls for c=l, n==20 when N=40. Draw 
the OC-curve. Approximate the probabilities by Binomial probabilities and 
compare the approximated OC-curve with the exact one. 

6.34. A single sampling plan, where the lot size is large, calls for c=2 
and »=40 ; (a) find AQL if the producer’s risk is 0.10, (6) find LTPD if the 
consumer’s risk is 0-15. 

6.35. A single sampling plan calls for a sample of size 50. By using a 
Binomial and a Poisson approximation find, (1) the acceptance number c if 
AQL is 5% and the producer’s risk is 0-02, (2) by using the c in (l) obtain 
the consumer’s risk if the LTPD is 6%, (3) plot the 00 curve and mark the 
consumer’s and producer’s risks. 

6.36. In a single sampling plan with n= 20 and c = 2, plot the OC 
curve and AOQ curve (assume a Binomial approximation). 
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CHAPTER 7 


SAMPLING FROM NORMAL POPULATIONS 


7.0. Introduction. In chapter 6 we have defined, a sample 
from a theoretical population, a statistic, sampling distributions 
etc. In this chapter we will study sampling from normal populations. 
From the central limit theorem we have seen that the normal dis¬ 
tribution is very important in statistical analysis. In this theorem 
we have stated that under some general conditions such as the 
finiteness of the population mean and variance, the standardized 
sample mean [that is ; (X—EX)/VVar (X)] is approximately 

normally distributed when the sample size is large, whatever 
may be the population. The population from which the sample 
is taken need not even be continuous. There are other impor¬ 
tant results which enhance the importance of the normal distri¬ 
bution. Some such results are mentioned in problem 7.30. 
Further in many practical problems, where some general condi¬ 
tions are satisfied it can be shown that a normal distribution 
is a good fit to the data under consideration In some non-normal 
cases appropriate transformations exist by which we can make 
the transformed variable a normal variable. So samples and 
sampling distributions, from normal populations, play a vital role 
in statistical theory, especially in testing statistical hypotheses. 

7.1. A SAMPLE FROM A NORMAL POPULATION 

If the stochastic variables Xi, ..., X n are independently and 
identically distributed as a N(/*, a) then we say that Xi, ..., X n is 
a simple random sample from a N(/a, a) or x\. ..., x n is called an 
observed random sample from a N (fx, a). (A numerical sample is 
taken as an observed sample). 

7.2. THE DISTRIBUTION OF THE SAMPLE MEAN 

If Xi, ..., X„ is a simple random sample from a N(/*, a) then 
X=(Xi-f ...+x n )/rc has a normal distribution with the mean p. 
and with variance a 2 Jn. This can be seen from Ex. 6.1.1. Hence 

the standardized variable Y=(X— s/ n ) has a standard 
normal distribution. The density function for X may be given as. 
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m = (n/2^e -<• - I ‘)* /2 '”-oo< S<00i 
-oo </ji<oo,a>0. ( 7 . 1 ) 

_ ,. v » a = U. and Var(X) =<* 2 /»- therefore p roba . 

bibties like Pg>c}, {c * } - geen that# jf the pare J 

normal probability table, we na a nd e, nt 

."SS <“" tion with the 

meters and a/y/n. 

Ex 7 2 1 A dressmaker made the following observations. 
The waist measurements of 9 girls of a particular age group give an 
averaae of 20". If he has enough evidence to justify Ms assumption 
that the waist measurements in that age group are distributees as a 
N(u.= 25 <y—2) whit is the 'probability that, m the long-run, he gets 
an average greater than 26". If he is taking measurements for batches 
of 9 girls ? 

Sol* According to our notation /x=25, g—2, n= 9. The 
sample mean has a normal distribution with the parameters /*=25 
and cr/v'w = 2/3. That is, 

/(*)=( WS ) e - 9 <*~ 2!i ) ! ' 4 

The required probability = 

00 

P{£^26}=s | f(x) dx. 

26 

Let £=(£—25)/(2/3) 

then 2 dt/?i=dx and when £=26, £=3/2. 


Let 


then 


CO 00 

Therefore, | f(x) dx= J (2u) 


-1/2 —<2/2 


dt= 0-0668 


(from normal tables) 

7 . . E *; 7,2 / 2 ‘ A man fishing at a particular spot in the Vembanad 

ivn tZf K tl rala i 16 ^ sh and the avera V e Je n9th was 12". Assum- 

G length ™ easurem ents are a random sample from a N(y, 

to Tit 1 SUCh that we can ™y « P"bo- 

btlity of 0 95 that the expected length , t lies between t 0 and h or 

V{to<H-<t 1 }=0-95. 

Sol. We can find ouU 0 and t x by using the property that, 

Y= ( x -^)/(<t/-\/w) : N(0, 1). 
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In other words, Y has a density function, 

f{y) = ~Jir~ e ~ y *i 2 >—°°<y<°° 

V 2 tc 

a nd the distribution is given in Fig. 7.1. 



Fig. 7.1. 

From normal probability tables it is seen that 



e~y 2 l 2 dy - 0-025 


23» 


(7.2) 


Therefore if we take y 0 = —1.96 and f/^1.96, we have 

P{2/o<2/<2/i} = 0.95 


or 



h 


96-4-<2!-u<1.96-£- 
yn r y/n 


H- 


95 




95 


Therefore, P{12-1.96)(2/4)</*<12+1.96(2/4)}=0.95. 

Hence, t 0 =12-1.96(2/4) =11.02 and * 1 = 12 +0.98=12.98. 

Ex. 7.2.3. A random sample of 25 pepper plants from a 
pepper plantation yield an average of 25 lbs. of pepper in a parti¬ 
cular year Assuming that the distribution of yields is a N(p, a=4) 
^11 you accept the hypothesis that the expected yield per plant, that 
*?* ^=30 lbs. Suppose that we are ready to accept the hypothesis if 
, probability of getting a sample rn'ian as small as the observed one 
ts at least 0.05. 

Sol. _< = (X- M )/(a/v/») ; N(0, 1). 

If 30 then (*-ju)/(<7/V») =25(25-30)/4 = -6-25. 

The probability of getting a t as small as —6.25 is. 
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-6.25 

(I/2tc) 1 ^ 2 e ~ t2,2 dt< 0.03 

(from normal tables.) 

Hence we reject the hypothesis. 

Exercises 

7.1. A random sample of size 20 is taken from a N(p-, o) where (i == 3 Q 
and a=4. What is the probability that the sample mean will fall between 
25 and 35 ? 

7.2. A random sample of size, 15 is taken from a N(p., o) where a=2. 
What is the probability that 3(# —p-)^s4 where x denotes the sample mean ? 

7.3. A random sample of size 20 is taken from a N(p., o) where <j=5. 
What is the probability that the sample mean will not differ from the populal 
tion mean by more than 2 in absolute value ? 

7.4. A random sample of size 40 from a N(p., o) where o=4, has a 
mean 35. Find two quantities t 0 and t x such that P{^ 0 ^p--^i 1 }=0-99. Are 

and ty unique ? 

7.5. A random sample of size 30 from a N(p-, g) where o= 25, has a 
mean 42.5, Find four quantities t 0 ,ty, t\,t\ such that P{i o ^l JL ^*x}=0.95 

7.6. Two independent random samples of sizes 20 and 25 are taken 
from two normal populations N(p., o 1 =2) and N(p., c a =5) respectively. What 
is the probability that the sample means will not differ by more than 3 in 
absolute value ? 

7.7. Two independent random samples of sizes 15 and 20 from 
N (p-!, cti= 2) and N(p. s , o 2 = 3) have means 30 and 32, respectively. Find t 9 
and ty such that P^o^P-x— P-2^1^0'95. Are t 9 and ty unique ? 

7.8. The average yield of corn in 10 experimental plots is 20 bushels. 
If the distribution of yield is N(p-, o) with the true average yield p. and with 
a standard deviation o=4, will you accept the hypothesis that the true 
average yield is IX, assuming that we are ready to accept the hypothesis if 
the probability of getting the sample mean as large as the observed mean, is 
at least 0.05. 

7.9. Two independent random samples of 10 boys and 15 girls have 
average I.Q’s 104 and 103 respectively. Jf the I.Q’s are distributed as N(Pi> 
oje.4) and N(p- 2 , c a *-4), will you accept the hypothosis that boys and girls 
are equally intelligent, taking the same acceptance level as in problem 7.S. 

7.3. THE CHI-SQUARE (x 2 ) DISTRIBUTION 

Another important sampling distribution is the chi-square dis¬ 
tribution. If X: N(0, 1) or if X has a standard normal distribution 
X 2 has a gamma distribution with the parameters a=1/2 and 3=2* 
This was seen in problem 4.46 of chapter 4. If we consider the inde¬ 
pendent s.v’s, Ni, X 2) ... X/,. where X* :N(0, 1) for a=l, 2, ...k } then 

the statistic X 2 +X 2 -(-j-X 2 is called a x 2 statistic with k 

degrees of freedom. In other words a x 2 statistic with k degrees 
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of freedom is the sum of squares of k independent standard 
norm 8 1 variates. This x 2 statistic has a x 2 distribution and the 
density function is given by 

1 ^--1 

j(x) 2*i2r[ic/2) x e (7*®) 

(A;—a positive integer) 

=0 elsewhere, 

where Jc, the degrees of freedom, is the only parameter. This dis¬ 
tribution may be derived as follows. 

Let X a =Y 1 -fY 2 +...+Y* (7.4) 

where Y<=X 2 and X f : N(0, 1) for i = l, 2, ...fe and all the X’s 

i 

are independent. 

Y f has a Gamma distribution with the parameters a=l/2 and 
{3=2 for all i=l, 2The M.G.F. for Y« is 

M y< (0 = (l-2t)" 1 / 2 

k 

Mx 2 (f) = n (1—20 _1 / 2 =(1—2i)- fc / a . (7.5) 

»=l 

By the uniqueness property of the moment generating func¬ 
tions, x 2 has a Gamma distribution w'ith the parameters a.=k/2 and 
£= 2 . 

Hence the density function is as given in (7.3). 

The x 2 variate with Jc degrees of freedom has a moment 
generating function 

M 8 (0 = (l-20"*/ 2 (7.6) 

The distribution is given in Fig. 7.2. The shape of the curve 
? spends upon the degrees of freedom k. For k=2 it is the 
exponential distribution. 



Fig. 7.2. 
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*ri C8 


CO 


Let P{Z‘>c.}=x- j f(X*)dX 


a 


where C* is a point such that, the tail area from C* to oo 
area under the curve between the ordinates at X 2 = C* and y** 
is a. For any given a and the degrees of freedom k we can^fi 00 * 
out the C*. This C* is tabulated for various values of « an 1 ^ 

degrees of freedom k. Such a table is called a x 2 table a 
extract is given at the end of this boolc. A x 2 distribution i^ n 
many practical applications which we will see from the foil • 
results. ua °wing 

Theorem 7.1. If X! and X 2 are independently distriW a 
as / variates with ki and k 2 degrees of freedom respectively 
Y=X 1 -f-X 2 has a x 2 distribution with k x +k t degrees of freedom^ 
Proof. The M.G.F. for X x is 

M Xi (t) = (l-2t)~ k il 2 

M Xj {t) = (l-2t)~ k * 12 
M y (0=M Xi (t) M Xj (t) 

(since X 2 and X 2 are independent) 
= (1 — 2t)~ (kl + k *)/ 2 

V ( 1 - 2 0~( kl + ks )l 2 is the M.G.F. of a y 2 variable with 

Of M G fTy hl f I ee fr• T h beief ° re from the uniqueness property 
S ' Y has a * 2 distribution with the parameter (k x +k/ 

with kt k^^ ^ 2 ^--X„ are w independent x 2 variables 

Y=X, +X 4- '"j-Tr • e £T ees of freedom respectively, then 
of freedom.' 2+ *"+ X " lsa X 2 variable with k x + k a + ... +k n degrees 

variables wher^Y ^ 2 are * wo independent stochastic 

dom and X Y has a ?., 2 distribution with k x degrees of free- 
(& 2 >&i), then X distribution with k 2 degrees of freedom 

freedom. 2 a ^ distribution with k 2 ~ik degrees of 

y 2 i ^ degrees of freedom is often denoted by 

^ the 

integer. We can fin T*??* 6 Jt ^ ay be noted that k is a positive 
distributions. For pr flm i a number °f statistics which have a X a 

* r-* 1 wr.- A™7i:« r s, d zs. ph x - x . x - 

£. + ... + Srf! 

G <j2 


(7.8) 
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evidently the sum of squares of n independent standarized 
formal variables and hence it is a x 2 with n degrees of freedom. 

further, 


n 

2 


t = 


(X,-p)‘ _ 


5 5 "[e^ 1 ( X ' — X+X—^) ! | 
ij .S (X,-X)» + »(X-^)d 

<T Zl « = l ) 


where 


-x 

*■_! O a c- 

=n SV^ 2 + (X-fi) 2 /(a/ v 'w) 2 
n _ 

S 2 = 2 {X. i — 'K)' i jn = the sample variance 

t=l 


(7.9) 


But E(X)=ai and Var (X) = ct 2 /^. Hence (X—/x) 2 /(c 2 /ft) is a 
with one degree of freedom. By corollary 2 of theorem 7.1, 
nS 2 /ff z is a X 2 variable with n — 1 degrees of freedom, if we can 

show that 7&S 2 /ct 2 and (X — p) 2 /(a*/n) are independent. This will 
not be proved here (see reference 3 and problem 7.32). We will 
state the following theorem without proof. 


Theorem 7.2. If X ls X 2 , ..., X n is a random sample from a 
normal population <r) and if 

S 2 =E n . (X—X) 2 /m 

l = 1 


is the sample variance, then 

nS> 2 =2^ =l (Xj—X) 2 /o 2 

has a x 2 distribution with — 1 degrees of freedom (d. f.) (7.10) 

When the degrees of freedom of a X 2 variate is sufficiently 
large the x 2 distribution approximates to a normal distribution. 
When the degrees of freedom k >30 a good normal approximation 
may be obtained. If Xi, ...X* are k independent normal variables 
with E(X i )=jU. for all i and Var (X*) = l for all i then the distri¬ 
bution of Y=X 2 -f... +X 2 is called a non-central x 2 distribution. 


Ex. 7.3,1. A?% endurance test is conducted on a random sample 
°/10 persons from a city, to study the ability to stand pain. The 
temple variance is found to be 16 units. If such batches of 10 are 
k *ted what is the probability that the experimenter gets a sample 
^ftriance as large as 18, assuming that the endurance measurements 
c di *tnbute& as a N(fi, a = 3) i 



238 


INTRODUCTION TO STATISTICAL MATHEMATICS 

Sol. According to our notation rc=10, a —3. But wS a /(j2 • 
with n—1 = 9 d.f. mS 2 /ct 2 =10x 18/9=20 when s 2 =18. 


Therefore the required probability — P{xjj ^20}. From th& 
X 2 tables, 

co 

J approximately. 

20 

Hence the required probability =0 02 approximately. 

Ex. 7.3.2. A random sample of 11 metal rods produced by a 
particular process are tested for breaking strength and it is found that 
the sample variance is 16 units. Find out two quantities t 0 and t 
such that we can say with a confidence of 95°/ 0 that the unknown a 2 
will lie between t 0 and t ± or P{t 0 ^a 2 ^ ; t 1 }=0.95. 

Sol. t&S 2 /g 2 is a x 2 with n — 1 d.f. 

Let y 0 and y x be such that 

Vo oo 

(i 4 K =°-° 25 = | k <.x 

0 Vi 

This is shown in Fig. 7.3. 



where j is the density function of a x 2 with 10 d.f. 

52 /ff 2 <2/i}=0.95 


P 

P 


f y° J< 

(ll.s 2 ^ a 2 ^ 


ii 


ni.i 

1 yi 




1U 2 

~yo~ 


5 r|-«- 

i - 0 ' 

^If a 


95 

95 


<b then 


1 1 

T > b) 


• • 


SAMPLING from normal POPULATIONS 239 

Here s 2 = 16 and from chi-square tables t/ 0 =3.247 
and yi=20.483. 

Therefore, f 0 ^=U s 2 /yi = (ll)(16)/20.483 = 8.8 

and f i= u s2 /2/o = (H)(16)/3.247=54.2. 

Ex. 7.3.3. A random sample of 15 fish who could jump over a 
dam in & *wer were caught and weighed. A sample variance of 10 
units is observed. Assuming that the weights of those fish who could 
jump over the dam has a N(p, a), will you accept the hypothesis that 
a = 6 ? Suppose that we will accept the hypothesis if the probability 
of getting a chi-square as large as an observed one is at least 0.10. 

Sol. ??S“/a 2 is a chi-square with n — 1 degrees of freedom If 
(j 2 =6 then ws-/ct 2 = 15(10)/6 =25 is an observed value of a x 2 with 
14 degrees of freedom. The probability of getting an observed x 2 
as large as 25 is. 


f *^( *14 )'^*i 4 < 0-10 (from chi-square tables). 

25 

Hence the hypothesis is rejected. 

Exercises 

7.10. If Xj, X 2 is a random sample of size 2 from a N(0, 1), show that 
the sample mean and the sample variance are independently distributed. 

7.11. If X is a X s with n degrees of freedom, show that V^X— \/2n 
is approximately normally distributed when n is sufficiently large. 

7.12. A random sample of size 10 is taken from a N(p-, a) where cr=5. 
What is the probability that the sample variance is as lar»e as 36 or as 
small as 9 ? 

7.13. A random sample of size 20 is taken from a N(n, cr) where o = 10. 
What is the probability that the ratio of the sample variance to the popula¬ 
tion variance will not exceed unity ? 

7.14. A random sample of size 15 from a N(n, cr) has a variance 16. 
Find out two quantities t 0 and t x such that P{f o ^a 2 << 1 }=0-95. Are t 0 and t x 
unique ? 

7.15. Paper bags are filled up with peanuts by a machine. A random 

sample of 4 such bags yield the following data. ^a; f = 20, ZXi = 120, where is 

the weight of the i th packet. Assuming that the distribution of the weights 
Js normal what is the probability that the sample came from a population 

with o 2 =0.1 ? 

7.4. THE STUDENT DISTRIBUTION 

In section 7.2 it is seen that the statistic Y=» {X- p)J{af^/n) 
* s a standard normal variable if Xi, ..., X n is a random sample 
r °ni a normal population N(/-t, a) and if X is the sample mean. But 


240 INTRODUCTION TO STATISTICAL MATHEMATICS 

often the population standard deviation a is unknown I n th 
case we would like to consider some other statistic which does not 
contain a. The distribution of the new statistic may be of som e 
use to us in doing some problems or testing some hypotheses etc. 
mg know that 


E ^ir x ± a = c a 

n — 1 


(7-11) 


If we replace a in Y by the square root of this unbiased 
estimator for a 2 then we get a new statistic called the student t 
statistic. That is the student t statistic is 


where 


t=(X-n)/( S'/Vn) 

S' 2 = Z (X t — X) 2 /(?1-1) 
*=I 


(7.12) 


The distribution of this statistic t is called the student t 
distribution. This distribution was first given by W.S. Gosset who 
used the pen name student and hence the distribution is called a, 
student t distribution. We know that 

- mp = (^) S2 


has a x 2 distribution with n — 1 degrees of freedom. Hence 

* = (X— n)/(S'/Vn) 

is called a student t with n — 1 degrees of freedom, and is usually 
denoted by t n _ x . The density function of a student t with v 
degrees of freedom is. 

r(^) 


for —oo<£<oo, v>0 
It is easy to derive this distribution since, 


(7.13) 


/n-i=(X- / a)/(S7Vw) = - 


X-n 

aj\/n 


(7.14) 


The numerator of t n _ x is a standard normal variate and the 

denominator is xH-i/(*-l) Y /2 , since (it-1 )S'«/o*is a Fur- 

ther the numerator and the denominator are independent. This 
is mentioned in section 7.3 and also see problem 7.32. This 

8aDa pl e mean and the sample variance is a 
characteristic property of the normal distribution. That is, the 
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sample mean and the sample variance ace independently distri- 
buted if and only if the population is normal. For a proof of 
this result and for other characteristic properties of the normal 
distribution see reference 3 at the end of this chapter. In order 
to obtain the distribution of a student t with v degrees of freedom 
we will consider the distribution of t=X/ V Yji where X and Y are 
independent, X is a standard normal variate and Y is a v 2 
variate with v degrees of freedom. The joint density function of 
X and Y is. 


/(*,«,) = (2*)-l/3 -*!* _L__ yV ,2-l e -y,2 (7>16) 

for «/>0, —oo<a;<oo 
=0 elsewhere. 

Let us consider the transformation, 

t = x/'f ylv 

U =V (7.16) 

The Jacobian of the transformation 

3m 

dx fix 

J= 

8t fiu 

dy InT 

=v^ y~% 

f[t, u)=j{x, y) v~i y* 


v~i («—1)/2 _2L( 

for —oo <t< OO y u> 0 

=0 elsewhere. (7*17) 


Now integrating out u we get 

v -\-1 


/(*)= 


v rt r(®/2) 




(«+l)/2 


(7.18) 


for — ooct< 


oo 


which^ Viden tly iS s y mmetric al3 ORt the f(t )-axis and t % for 
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is tabulated for various values of « a>nd for various values of th 
istabuiacea ^ Such a table is called a student t table 

ArfextracHs given at the end of this book. The student t distrii 

bution is given in Fig. 7.4. 



When the degrees of freedom v is sufficiently large, the 
student t distribution approximates to a normal distribution. A 
good approximation is obtained when v>30. If instead of 

t=(X-p)/(S’/ x /n) we take Y=X/ (S '\s/n) then Y is called a non- 
central t and its distribution is known as a non-central t distri- 
bution. 

Ex. 7.4.1. Bags are filled with an expected weight of 20 lbs of 
potatoes by an automatic device which can only count the potatoes. 
A random sample of 9 bags shows an average weight of 18 lbs and 
s . %) 2 /(n 1) = 16 lbs. If samples of 9 bags are taken, what 

is the 'probability that one gets a sample with an average weight excee¬ 
ding 22 Lbs., assuming that the weight distribution is a N(p=20, a) ? 

_Sol. Here the population standard deviation <r is unknown ; 

* = ( x —has a student t distribution with n —1 d.f. 
According to our notation n= 9, p=20, s'=4. When id=22, 
t = ( 22—20)/(4/\/9) =3/2. Hence the required probability is, 

co 

P{f>3/2}= f /(f) dt and the d.f.=»s-l=8. 

3/2 

From ^-tables P{^3/2} = 0‘085 approximately. 

Ex. 7.4 2. A random sample of size 20 from a N(p, a) has a 
sample mean 25 and a, sample variance 16. Will you accept the 
hypothesis that p=24. (Suppose that we are ready to accept the hypo¬ 
thesis if the probability of getting a student t as large as the observed 
one, is at least O'05). 

Sol. We know that < = (X~p)/(S'//*) 

S' 2 = nS 2 /(n-l), 

is a student t with n—1 degrees of freedom. 

Here n = 20, ,s 2 =16. 
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S ^ IN0 


If 


ns 2 /{n— 1 ) = 20( 1 C)/l 9 = 17 approxima tely. 
/x = 24, then 

^ _/*)/(«'//»)=(25-24) V20/VI7 


= 1-09 approximately. 

If our hypothesis f*=24, is correct 1*09 is an observed value 
0 f a student t with 19 degrees of freedom. 

The probability of getting a t as large as 1.09 


00 

f[x) dx. 


1.09 


-where f{%) is density function of a student t with 19 degrees 

0 f freedom. 

From the student t tables 


j f[x) dx> 0*05 

1.09 

Here we will accept the hypothesis. 

Exercises 

7.16. A random sample of size 20 from a N(p., ct) has a variance 18. 

If k- is estimated by the sample mean X then 1 X— h- 1 may be called the 
error in the estimation. Find the probability that this error will not 
exceed 2. 


7.17. A random sample of 10 citizens in a big city have an average 
income of $10,000 with a standard deviation of $200. If the income distri¬ 
bution of the citizens in the city is approximately N(f-, a), obtain a 95% 
interval estimate for n or in other words obtain t 0 and such that 

F = 0.93. 


7.18. A random sample of 20 university students have an average 
height of 65' / with a standard deviation of 2". If the height measurements 
the university students under consideration is assumed to be approxi¬ 
mately N(|i, ct), is it reasonable to take a decision that n = 64" ? 


, 7.19. 

freedom. 


Obtain E(i) and Var ( t ) where t is a student t with n degrees of 


^‘20. If two independent random samples of sizes 20 and 25 from 
to tal Qnd o) have variances 16 and 18 respectively, is it reasonable 

50 a decision that p. 2J based on the above observations 1 

7.21. Two lrW^Orvor-rlnvil. enTYinloo n f Q17.AR 10 and 12 are taken 

yshev’s 


from 

i n Quality 


Two independent random samples of sizes 10 and 12 are 
ln . 'tb ci = 2) and a N( |x, a, = 5) respectively. By using Cheby 

r and * — ° r °hh©rwise obtain a probability limit that X x ^X 2 -(-5 where 
are the sample means of the two samples - 
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^ 7M Two independent -"3^" 

■ — - *-•*. 

lity bound that Ai 

and X, denote the sample means ^ ft population with 

7 23 A random sample 0 S1 ^f - i ou t & such that the probability 
me an ^ and standard deviation ^ is the sample mean. 

that 2 I^X-F l< iisa 68 ’ diameter of 10 units is produc. 

7.24. A machine part with a specifi dd fa ple of 5 

ed by a production process. Thed 1 *? 1 ®*® , samples of size 5 are taken 
10.001, 10.002. 9 - 9 ®' 9 ; 98 Vv “- t °2 out of U 3 random samples of siae 8 will have tha 
what is the probability tha ‘ 

™eraee diameter between 9.99 and 10.011 

[Hint: Obtain P { 9.99<*<10.01> ; then apply the Binomial prob.bU.ty 

law.] 

7.5. THE F-DISTEIBUTION 

We have seen several important statistics like the X’ st ^js- 
tic, the student t statistic etc. Now we will define , ees 0 f 

If we have two independent X s statistics with m and n degrees ot 

freedom respectively, then the ratio 


X 2 lm 


Fm» n 


X 2 /w 


(7.19) 


is called an F statistic and its distribution is called an ^-distri¬ 
bution. This distribution is obtained in Ex. 5.5.3. It is also 
obtained as the distribution of a transformed beta variable in 
problem 5.31 of Chapter 5. If we consider a student t statistic 
with v degrees of freedom that is, 


*.-x/ y 


where X : N(0, 1) and X* is a x 2 with v degrees of freedom, t 2 is 


evidently an F-statistic. The numerator of £ 2 is a Xi/1 an( l 

denominator is a x 2 Jv and further these two x 2 *8 are indepen¬ 
dent by the assumptions in t v . Since there are two degrees of 
freedom attached to an F-statistic (that is, the degrees of freedom 
for the numerator x 2 and the degrees of freedom for the denomi¬ 
nator X 2 ) we always say an F with m and n degrees of freedom, 
where m is the degrees of freedom for the numerator X 2 and n is 
the degrees of freedom for the denominator x 2 . 

For example < 2 which is mentioned aboye is an F with 1 
and v degrees of freedom. W e will use the notation F w , n (E with 
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.1 n degrees of freedom). In section 7 3 , . 

^ndom sample of size n from a N(/a CT i ave see n that if 

* Se variance S* then nSVo» is a wifchLfY mean ^nd a 

Now we can state the following theorem. degrees of free- 
° Theorem 7.3. If two independent random i 

* and from two normal populations N(^, ff) and Nfu f Z6S 

• a 2 i G)* nave 

sample variances S 1 and S 2 respectively, then 


the 


n i Si/(^i—=1) 
r ? 2 Sl/(n 2 —l) 


= F 


n l~l» w. — 1 


(7.20) 


r in other words, £ m Sj/fm-l) J j [ (n, S 2 /(« 2 -l) 1 has 
'-distribution with tii-1 and n 2 — 1 degrees of freedom J 


an 


or 

F-distribution witn tii-i ana rc 2 —J. degrees of freedom. 

Proof. If Xi, X 2 ,...X ni and Yj, Y 2 f ...Y Wa denote the two 
samples 

ni S? 

1 /v wo 2 

X 


= (X,-X)» 

" ~ 


v 1 


and 


n 2 S 2 2 /o«= 1 (Y.-Y) 2 /<t 2 =X^_i 


Hence 


% s* j 

1 G 2 

1 

"iSj 

/ ( Wl - 1 ) 

[ S 2 I 

1 ^ 

/ ( ,l2 - 1 ) 

« 2 S 2 

/ 


= Fw 1 -l, n 2 —1 

the the SinCe ) tlie tW0 ** * S are inde P endeilt b y the assum P tions in 

tan+ * n f eres ting result makes the F-distribution an impor- 

theV ampling dis tribution. This distribution is sometimes called 

i, Varia nce-ratio distribution. The density function of an F*», « 
s given by 


!{x, 6) = _. 


f m-\-n \ ——l 

\ ~~2~ J ( m \ m l 2 x 2 

wwi ^ ( ,+ ^ • r 


r( ir I — 


(«»+«) 

| 

for 0<a<oo 


0 elsewhere 


0 =(m, »)-positive int « 8 e ”j 


!8 
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fntion is given in Fig. 7.5. The shap e 

A ^^S e the degrees g of freedoms « and ». 
of the curve varies 
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Fig. 7.5. 

Here m and n are the parameters. The tail area, as shown 
in the figure is, 


00 



Foe for various values of the degrees of freedoms m and n and 
for various values of oc, is tabulated. Such tables are called 
F-tables. An extract is given at the end of this book. If the 
numerator x 2 in an F is a non-central x 2 then such an F is 
called a non-central F and its distribution is called a non-central 
F distribution. 

Ex. 7.5.1. Two random samples of 25 and 26 students are 
taken from students who are interested in higher altitude flying , and 
their heights are measured. What is the probability that the ratio of 
the sample variances (first to the second) is at least 3, assuming that 
the height measurements of such students has an approximate norn-al 
distribution ? 

Sol. Here the samples are from the same Normal population. 
Hence, 


[«i Sl/(ni—l)]/[w2 Sf/(w 2 —1)]= S' 2 / 




"a-i 


or 


fh n 2 _ i 
n 2 n x — 1 




is F 



Here »,= 25 and n 2 =20 => —L 

n 2 Mi—1 

= 0.99 approximately 
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p j n i (%—1) ) 

C n 2 (^-lf ^ (0.99)(3)j=2.97 

p|l?24.19 > 2.971 

CO 

j / (a;)$ 2 = 0.01 (approximately) 

2.97 1 

where f(x) is the density function for an F with 24 and 19 degrees 
of freedom, and the probability 0.01 is obtained from an F-table. 

Ex. 7.5.2. Two random samples of sizes 15 and 10 from two 
normal populations N(p lt <ji) and N(p 2 > a 2 ) have sample variances 16 
and 25 respectively. Do you accept the hypothesis that <ji=ct 2 ? 

Sol. If ax = ct 2 , [ni Sf/(wi— l)]l[n 2 Sl/(n 2 — 1)] has an F distri¬ 
bution with ni —l and v 2 — 1 degrees of freedom. Here ni=15, 
n 2 = 10, s*=16 and 62 = 25 . 



n i s V( n i ~!) 

--=0.6 approx. 

^2 4/{n 2 -l) 

If c!=a 2 , 0.6 is a value assumed by an F with ni—1 = 14 and 
n 2 —1 = 9 degrees -of freedom. The probability of getting an 
F 14 , 9 as large as 0.6 is 


[ f{x)dx 
0.6 


where f(x) is the density function of an F with 14 and 19’degrees of 
freedom. But this probability is greater than 0.05. (This is seen 
from the tables). If we are ready to accept such a hypothesis, 
when the probability of getting an F as large as the observed one, 
18 atleast 0.05, we will accept our hypothesis that cri=a 2 * The 
acceptance or rejection depends on the acceptance probability level. 
This aspect will be discussed in the chapter on Testing Statistical 

Hypotheses. 

Some of the important sampling distributions when a random 
sample X lf X 2 ,..., X n is taken from a N(fA, a), are given in the 

iollowing table. 
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Statistic 


Distribution 


1. X i for any i 

Xi+Xa-l-.+x n 

3. X=(X 1+ X 2 +.+X n )/n 

(X,- — H-j/a for any i 
5. (X —nj/(cr/yn) 

6- (X— H-)/(S7V») where 

s/ *= (Xi-xjV(»-I) 

7. (X,*— f*) 2 /a 2 for any $ 


n 

8. 2* (X t -fr) 2 /a* 


9. n S*/(j* 


n 


where S 2 = 2 * (X t .-X) 2 /n 

i=l 


1n 2 2 2 

i# \ +**, +••■+**„ 

2 2 2 ^ 

where , y^ ...y^ are indepen. 

dent x 2 ’^ ^ 

X* I'm 

H. ——-where y? m and y« are 


X 2 /» 


ft 


N(p., a) 

N(n |i,yti(j) 

V. 

N(ft, a/yn) 

N(0, 1) 

N(0, 1) 

Student £ with n —1 degrees of freedom. 


G^mma with parameters a=l/2 and 


Gamma with parameters a=n/2 and 

3 = 2. or a x 2 with n degrees of 
ireedom. 


X 2 with ;i—1 degrees of freedom. 


X - with ife 1 +& 2 + ...-|-& JI degrees of free¬ 
dom. 


F-with m and n degrees of freedom. 


n 


independent x 2 *8 with m and n 
d,f. 5 respectively. 

12. s'; / s' 3 2 

where S'! = (X^X^/faj-l) 

S'! ( Y .- Y ) 2 /K-i) 


F with n i—l and n„-l d.f. 


Where X 1# .., X nl and Y 2 ..., Y n2 are two independent samples from 
X(rj, a) and N(|t 2f a) respectively. 

Exercises 

7.25. Obtain E (F) and Var (F) where F is an F statistic with m and 
n degrees ef freedom. 
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0 ) 0 ***° 


7 26- If j /(*)*>-«. where /(*) is the density function of an P 
F 

m,n 




. - + h w and » degrees of freedom, show that 

tistic 


a, m, n 


1/F 

' 1 -^a p n, 


m 


that n 


- m> n an( * 1/®** m have t ^ ie same distribution. 

?r ° V0 7 27. If two ran 4°. m sam P les of sizes 15 and 20 are taken from a 
, a) what is the probability that the ratio of the sample variances does not 

Sed 2 ? 

7 .28. If X and Y are independent show that c*X and 6Y where a and 
^ .re non-zero constants and X and Y are s.v’s, are also independent. 

7.29, If Xi,...,X fc are independent normal variates show that a x Xj4. 
+afcXfc n ormal where a 4 ’s are constants. 

[Hint : Use moment generating function*.] 


7.30. If Xi, X k are independent normal variates with unit variance 
«nd if 2 aib}=V> show that ajXj-f--.. 4 - a fcX fc and &iX 1 -f-...-j-6j,Xj. are indepen¬ 
dent, where o’sand b’s are constants. 

[Hint : If X and Y are normal Cov (X, Y) = 0 implies independence]. 

Independence of linear combinations of independent variates is a character¬ 
istic property of the normal distribution. It can be proved that if two linear 
forms Y 1 =>OiX 1 + ... + OfcX)j. and Y 2 = 6 jXj-|-...-|- 6 ^X^ where X’s are indepen- 
dent s.v’s, are independent then X 4 for which is normally distributed. 

For characterizations of normal distributions by properties like this see 
reference 3 at the end of this chapter. 


7 31. Variance Stabilizing Transformation. Let T be a statistic 
constructed from a sample of size n. Let E(T) = 0 and Var (T) = 0(0). A 
transformation T-»gr(T) or a construction of a function gr(T) of T, such that 
Var 0 (T) is independent of 0, is called a variance stabilizing transformation. 
Sometimes such a transformation makes p(T) a normal variate. Under some 
conditions on T and 0 (T) it can be thown that such a transformation is given 


by 0(0) = J c d0/-\/0(0) 

where c is a constant (independent of 0). 


(a) Square root transformation. If X is a Poisson variate with 
'qlxy -yx ^ 8 ^ ow that a variance stabilizing transformation is given by 


_ Inverse sine transformation. If X is a Binomial variate with 

g: ^ters P and n, show that a variance stabilizing transformation for the 
raia Proportion X/ra is given by gr(X/n)= sin' 1 Xjn. 

c oefflp‘ r ® ,an ^ 1 " 1 transformation. The variance of a sample correlation 
is the n \ can be shown to be (1—p 2 ) 2 /n where n is sufficiently large, and p 
hanBfn° P - 10n correlation coefficient. Show that a variance stabilizing 
also knrS atl0n forr is given hy ? (r) = tanh' 1 r—(1/2) log (l+r)/(l-r). This is 
n as the Z transformation where z={\j'i) log (1 +r)(l — t). 

Xj’ t Independence of mean and variance in normal samples. Let 

u be a simple random sample of size n from a cr). 

8 a —down the joint characteristic function 0 t a ) of X and 
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(b) For an orthogonal transformation (that is a transformation 
XA=Y where AA'=I, X and Y are vectors and A is a matrix see chapter 1\ 
show that the Jacobian of the transformation is unity in absolute value. 

(c) Show that there exists an orthogonal transformation of Y 

= (x lf .... x n ) into Y ={y x , .... y n ) such that 

n n 2 

Zj =l (*,- V -) 2 = (Vi-V-V n ) 2 + Zj =2 Vj 

and ^j—1 ( x J~x) 2 l n = 2 ^5 / n ‘ 

(d ) Now evaluate <p(t 1 t 2 ) by using (a), (b) and (c) and show that it can 
be put in the form 0(Z X , £ 2 ) = 0 1 (£ 1 ) and 0 2 (£ 2 ) where does not contain 
and 0 2 does not contain t t . 

(e ) From ( d ) show that X and S 2 are independently distributed. 

(f) ^rom (e) obtain the characteristic function of wS 2 /<j 2 . 

(g) Show that X has a Normal distribution and wS 2 /a 2 has a chi- 
square distribution with n — 1 degrees of freedom. 
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CHAPTER 8 


INTERVAL ESTIMATION 


a o Introduction.— In the previous chapters we discussed 
Ptical distributions, .sampling and sampling* distributions, 
following chapters we will consider the applications of the 
In its derived so far. Statistical methods help us in making a 
T film in a situation where there is a lack of certainty. This 
rnoedure of decision making is usually called statistical inference. 
Statistical inference may be broadly classified into testing 
hypotheses and estimation. In this chapter we will consider a 
special case of estimation problems. 

The principle of estimation is to find out estimates for the 
parameters of a distribution, based on an observed sample from 
the distribution under consideration. If we give a single quantity 
as an estimate of a parameter, such an estimate is called a point 
estimate and the corresponding estimation procedure is called 
point estimation. This will be discussed in the next chapter, it 
we estimate an interval such that the interval will cover the true 
value of the parameter with a certain probability, such an estima¬ 
tion procedure is called interval estimation. 


There are many practical situations where we are interested 
in getting an interval estimate for a parameter. A drug manu¬ 
facturer may be interested in finding out two numbers so that e 
can make a statement that, by this new drug the survival rate will 
be between 90 and 96%. A toothpaste manufacturer would like 
to estimate the reduction in cavities so that he can claim that his 
toothpaste will reduce cavities by 40 to 45%. It is helpful for the 
Incometax Department to have an estimate of the range in which 
the tax return of the succeeding year will lie. There are many 
such situations where we would like to get interval estimates. 


8.1. CONFIDENCE INTERVALS 

* 

If we give an interval estimate for the average I.Q. of all 
university students on the north American continent and if we 
sa y that the average I.Q. is between 105 and 110, we may be 
making such a statement by observing a random sample of uni¬ 
versity students. So we can make only a probability statement. 
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T Q is between 105 and 110 with a probabi- 

I'tvofO 99 U etc Ve Tbis means that our statement will be true in 
lity ot O.yy etc. or we on taking samples 

!f 0/ th°e f sameTsbse'and obtain such intervals, 99% of such intervals 
will cover the true average I.Q., in the long run. In this case we 
made a statement with a confidence of 99% and gave an interval 
(“os, 110) which may be called a 99% confidence interval Such 
an interval can be given if we know the distribution of the I.Q’s. 

In general if we find out two quantities t 0 and t L based on a 
random sample from a population f(x, 6) such that. 


P{f 0 <K^i} = 1 - a (8’ 1 ) 

we say that (t 0 . h) is a 100(1 — a)% confidence interval for 0;t o and 
<1 are called the lower confidence limit and the upper confidence 
limit respectively and (1—a) is called the confidence coefficient. 
Such a statement can be made about a parameter by suitably 
selecting a statistic which contains the parameter under considera¬ 
tion and whose distribution is independent of the parameters. 
This can be seen from the following examples. But such a 
statistic need not exist always. Further in this chapter we will 
construct confidence intervals only for the parameters in a normal 
distribution and in a Binomial distribution. The same ideas can 
be used for setting up confidence intervals for the parameters of 
other distributions also. 


Ex. 8.1.1. An observed random sample of size 9 from a 
N(ix, a=2) has a mean 50, obtain a 95% confidence interval for fi. 

Sol. Here we are asked to make a statement, 

P{'o<fi<«i}=0-95, 

where t 0 and ti are known quantities. 

Let us consider the statistic 

X-p (8.2) 

afy/n 


This contains the parameter fi. We have an observed value 

of X and a and n are known. Further {is a N(0, 1) and hence 
its distribution is independent of the parameters. From a normal 
table we get 1.96 such that 



(8.3) 


This is illustrated in Fig. 8.1. 
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25% 


N(0, 1) 



Fig. 8.1. 

The double inequality 

- 196 < < 196 (»•*) 

may be written as 

-1.96 . ~ <^-,*<1.96 

V n V n 

»•«., 2—1-96 ^<^<* + 1.96 (8.5) 

But in this problem s = 50, cr=2 and w=9 and therefore 

48.693<^<51.307 

or P{48.693</*<51.307}=0.95 (8.6) 

48.693 and 51.307 are the lower and upper 95% confidence limits 
and (48.693, 51.307) is a 95% confidence interval for n. 

Comments. The probability statement (8.6) does not mean 
that \i is a stochastic variable and it will fall in the interval 
(48.693, 51.307), it means that if we continue taking random 
samples of size 9 and every time calculating a 95% confidence 
interval for p, 95% of our intervals will cover /x, in the long run. 
Here ^ is an unknown constant : we find an interval such that 
most probably this unknown value is on this interval. 

8.2. THE BEST CONFIDENCE INTERVAL 

In Ex. 8.1.1 we obtained a 95% confidence interval for /x as 
(48.693, 51.307). This is obtained by deleting areas equal to 
0-025 at both tails of a standard normal distribution (Fig. 8.1) It 
can be seen that the interval given above is not a unique one. 
Suppose that we had deleted an area equal to 0 05 at the right 
end then we can get two quantities —oo and 1.64 from a normal 
table such that 

P$—oo< <1.64)^0.95 

This leads to the interval (—oo, ir-f 1.64 a/y/n) 

= (-oo, 51.09) 


( 8 . 8 ) 
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* w nf 0*5°/ confidence intervals can be constructed for 
would prefer to have the one which is shortest. 
£■ E . v,d “‘Llrionof comparison of intervals constructed with 
? 1S 18 “onfidence coefficient. For other desirable properties of 
*' b6 fiZce?nt fi ervals, such as ‘short on the average’, 'most selective’ 
Xrt unbiased’ etc., the reader may refer to the bibl.ography at 
the end of this chapter. In a symmetrical distribution it may be 
pftsilv seen that the central intervals, in the sense the intervals 
obtained by deleting equal areas at both ends usually give the 
shortest intervals. This is illustrated by Fig. 8.2. 



In this illustration if Z^/g is moved to the left so as to cut 
off an area equal to (2/3)a at the right tail then —Zo,/ 2 is moved 
more to the left so as to make the sum of the two tail areas equal 
to a. This results in a longer internal than the interval correspond¬ 
ing to the omission of equal tail areas at both ends. In the following 
sections only the central intervals are considered. Here and in the 
following sections we consider the construction of confidence 
intervals when there is only one parameter. The case when there 
is more than one parameter is discussed in section 8.8. 

8.3. CONFIDENCE INTERVALS FOR MEANS 

Let xi, X2, ...x n be an observed random sample from a 
N(^, a) where a is known. A 100(1—a)% confidence interval may 
be established for ft based on the observed sample. 

jfes = N(0, 1) (8.9) 

Corresponding to any « we can find out an Z a / 2 such that 

p |—Z a /2<^^ <Z«/ 2 j = l-a. 

This is illustrated in Fig. 8.3. 
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z is obtained from a normal probability table. 

P ( - Z “' 25S 5/7» < V J =1~« ( 8.l 0) 

P [j -Z a/2^ < S -^< Z a /2 ^=|=I-a 
P [*- Z a/2^ < r<J + Z n/2 ^| = l_ a ( 8 U) 

„ =*“ " ta . “f / hen f oe ^Ofl-ajro confidence limits 

J-Za /2 olvn and z + £ a / 2 c/\/n,for fi, are known. This 100(1 -a) 0 / 

confidence interval for p is /o 

(* ~ Z a /2 a/y/n, ^+Z a ^2 Gjy/ri) (812) 

If a is not known then the statistic 


S'/Vn 


(8.13) 


has a student t distribution with n-1 degrees of freedom, where 

S 2 =^ (Xj X)i/(* 1). Hence corresponding to any a we can 
find out a t x j 2 such that 

p l _f «/ 2 < VpS = (8.14) 

This is illustrated in Fig. 8.4. 



Fig. 8.4. 


Pig. 8.4 gives the distribution of a student t with n —1 degrees of 

treedom 


where t 


P 1 VjTn = l — 

P{—««/ 2 2 s‘j^n} = l—u 

P{I—£^2 s'l^/n}=l —a 

The 100(1—a)% confidence interval for ^ is 

(it—< a/2 s'/Vw, £ 

B *«/» is obtained from a student « table. 


(8.15) 
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F _ Q 3 1 A random sample of 25 experimental beef cattle 
showed an average increase of 35 lbs after admvnistenng a new diet. 
If the experimenter has enough data to justify the assumption that 
the increase in weight is approximately normal with standaid devi. 
ation 4 , obtain a 95 % confidence interval for the expected increase i n 

weight by the new diet. 

Sol. According to our notation, the confidence coefficient, 

1 — oc=0.95, which implies that a=0.05 or a/2 =0.025. 

Consider the statistic. 


X-/x 
a\yj n 


( 8 . 16 ) 


But t is a N(0, 1), and thus the distribution of t is indepen* 
dent of the parameter y, 

f(t)=(2n)—$ c —f*/ 2 , — oo< t <cx3 (8.17) 


— 1.96 


But, J f{t) dt= 0.025=J f(t) dt . 


( 818 ) 


1.96 is obtained from a normal probability table. Therefore, 


f" 1 - 


96< x /* ^1.96 
a/yn 


| =0.95. 


That is, P{z-1.96 <t Jyfn ^y^Z + lM a/ V 'w}=0.95. 
Here 2=35, ct= 4, and #=25. 

Therefore a 95% confidence interval for y is, 

(2—1-96 a/y/n, 2+1.96 al^/n) = (33.43, 36.57) 


(8.19) 

( 8 . 20 ) 


( 2 :-i-yb o/y/n, z + 1.96 <j/^/n) = (33.43, 36.57) (8.21) 

Comments. According to our results, P{33.430^36.57} 
= 0.95 does not mean that fa is a stochastic variable and the pro¬ 
bability that it will fall between 33.43 and 36.57 is 0.95. The 
meaning of our confidence statement is as follows ; if sampling is 
continued, each time taking a random sample of size 25. we can 
evaluate the interval ( x —1.96 cfy/n, 2+1.96 a/-\/n) for every 
sample. In the long run 95% ot the intervals will c jver the true 
value of the parameter y or 95% of the intervals will contain y. If 
ve make a statement that y lies in one of these intervals w r e will 
be wrong in 5% of the cases, in tlie long run. 

Ex. 8.3.2. A dress-maker finds that a random sample of 16 
gir 8 oj a certain age group in a particular city shows an average 

°Ki° n y ith a varianc * °f 25 “- If he has enough 

hutJTnh, aSS oZ e , that the bust . measurements are normally distri - 

of nirlt 0 tnier val estimate for the average bust measurement 

oj girls m that age group in that city. 
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Sol. According to our notation n= 18, *=40 and s 2 =25. 
In this problem tte population variance c 2 is not known and hence 
we will consider the statistic, 

f= sfv* < 8 - 22 > 

where S 2 =^(X^ ““l) and t is a student i with n —1 degrees 

of freedom and further the distribution of Hs independent^ /x 
and the ‘nuisance’ parameter cr. 

1 —a = 0.99, => a /2=0.005. 

From a stulent t table corresponding n —1 = 16—1 = 15 
degrees of freedom, we get the values —2.947 and 2.947 such that 


- 2.947 


CO 


j f(t) dt = 0.005= f(t) dt 
°° 2.947 


(8.23) 


where /(£) is the density function of a student t with 15 degrees of 
freedom. 

Hence pj -2.947< <2.9471=0.99. (8.24) 


i.e. 


P| 35—2.947. < /4<*+2.947 ~ 


— 1 = 0 . 99 . 


(8.25) 


Here *=40, n = 16 and s 2 =25=27(*<— xffn. 

But ws 2 /(ra-l)=s' 8 = (lG)(25)/15=26.67 
A 99% confidence interval for /x is 

( £—2.947 , £+2.947 ~^=^ 36, 96, 43.04^ (8-26) 

Comments. Selection of the appropriate statistic us the 
most important problem. The statistic should contain the para- 
meter under consideration and the distribution of the statistic 
thp qqo/ 6 m ^P end ® n t of the parameter. It may be noticed that 
onL /o c ° nfidence interval is not unique. From a student t table 

2 602T n hth S t^ 15 deSree s of freedom < we can find out quantity 


00 


2.602 


J7(«) <2«=0.01 ^ [ f(t)dt= 0. 

Son * 


99 


*c., 


2.602 
P 


CO 


i H < a 


.6021 =0.99 
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, e p( _oo</ x <^ + 2 - 502 "V»" ) _ 

Here a 99% confidence interval lor /* is obtained as 

( _co, S+2.602 ^)=( 42.68). 

Evidently this interval is larger than the one we obtained 
before Central intervals obtained from equations of the type 

P{— t * / ^ 0C i 2} = 1 “ a r 

are usually the shortest in symmetric distributions, where T i» 

the statistic under consideration.. 

Exercises 


(8.27) 


8 . 1 . A random sample of size 50, taken from a N(f*, cj=» 5), has a mean 
40. Obtain a 95% confidence interval for 2(x-j- 3. 

8 . 2 . The average height of 20 university students is 65". Assuming 
these observations as a random sample from a N(n, a=2) obtain a 90% 
irterval estimate of the average height of all the university students from 
which the sample is taken. 

8 . 3 . A random sample of 25 citizens in a country shows an average 
annual income of $ 10,000 with a standard deviation of $ 200. Assuming the 
income distribution is a N( t*, cr), obtain a 90% interval estimate for the 
average income of the citizens in the country. 


8 . 4 . Ten bullets from an enemy gun have an average diameter of 5 
units with a standard deviation of 0.02 units. Assuming that this sample is 
a random sample from a N([i, o), obtain a 99% interval estimate of the dia¬ 
meter of the enemy gun barrel, taking the diameter of the gun barrel = n 
+0.01. Is it possible to get an interval estimate if (1) only odo bullet is 
available, (2) only one bullet is available, but cr is known to be 0.03. 

8 . 5 . A random sample of 40 chickens from a farm, has an average 
weight of 4 lbs with a standard deviation of 0.5 lb. Assuming that the 
sample can be considered to be from a N((a, ct), give a 1>9% interval estimate 
of the expected income of the farmer on the average per chicken if chickens 
are sold $0.25 a lb. 


8.6. A random sample of 35 days show an average increase of $50 
with a standard deviation of $10 in sales after the appointment of a new 
sales girl. Assuming that the increase in sales can be considered to be dis¬ 
tributed as a N(p., a), obtain a 95% interval estimate for the expected 
increase in sales per day after the now appointment. 


8-7. A survey conducted on a particular day among a random sample 
of 20 students from a particular university shows that they have spent on 
t e average 5.5 hours for studies with a standard deviation of 0.5 hour. 
Assuming that the number of hours spent on that day by the students of that 
university, has a N(n, ct ), obtain a 95% confidence interval for the expected 
decrease in the expenses, taking the decrease in expenses = (0.1 time the 
number of hours spent for studies on that day). 

8.4. CONFIDENCE INTERVALS FOR DIFFERENCE 

BETWEEN MEANS 

Let x lt x 2 ,.. x ni and y n ^ be two independent random 

samples from the normal populations N(/Uj, and N(/* 2 , *«) 



juterval estimation 

respectively, "here oi and <J 2 are known. Based ™ n, 

s J,les we can contract 100(1-,)./, confidence toterva™ 


Hi—H' 


(X-Y)— (to-p,) 

~— rxr : N < 0 ' ^ 

Gi 

4 


7i\ n 2 

Hence corresponding to any a we can find out a a Z 
normal tables such that 

p f V ^ ( X ~y 1 1 /^1 —. sr ^ry ^ 

t r~|— ^Z et/2 } = 1— a 


a /i 


n x 


f 


n 2 


P \ (S-yJ-Z^/al 


2 

<*1 


»1 


n 2 


Ol — — y) 


Ji Go 

+Z* /2 l-+ — 

\ 7^1 7^2 



(8.28) 


from 


(8.29) 


= 1—a 


(8.30) 

Here a! and o 2 are known, then a 100(1—a)% confidence 
interval tor /i a is 


[x-y-Z 


V 77 


04 a 
+* 




/ ^2 

g+z«/2 /y^ — + 



(8.31) 


If ax —<j 2=ct and ifa is unknown then 


S* 


mator for a 2 and hence, 



Wi-fn 2 


—2 ^ is 


_ (X-Y) —(j^i—^ 

% + n a -2 “ g/ < /(T/» i j+(l/ w «) 


an unbiased esti- 


(8.32) 


^i + n 2 — S/ V(1/»*)+(!/»*») 

! 8 a Undent t with n, +«,-2 degrees of freedom and if »i+"= -2 
13 efficiently large "'*1 , has an approximate standard normal 
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distribution, where S 2 (X* X) 2 /^i 

and S? =S n . 2 , (Y<-Y)/n 2 . Therefore a 100(1— a) % confidence 

a I 

interval for /^i — is given by 

V~t aL j 2 sj V"l/Wi-hl ln 2 > %—y-\rta,l 2 s/ 4 l/«i + l/»g) 

(8.33) 

where is obtained from a student t table corresponding to 

%+w 2 —2 degrees of freedom. If w 1 4-% 2 —2 is sufficiently large 
a 100(1- a)% confidence interval for (P 1 —P 2 ) is given by 

(*~2/± z kJ 2 sf\/ l/^i+ 1 /^ 2 ) (8.34) 

where z a /2 is obtained from a normal table. (A good approxima¬ 
tion to the normal is obatined when ni-\-n 2 — 2 >30). 

Ex. 8,4.1. A farmer made the following observations. Random 
samples of 10 and 12 newly planted rubber plants of two varieties 
gave the average growths in the first week as 25" and 24" respectively. 
If he has evidence to assume that the growths are distributed as 
N{Pi, ci—2) and N(p i , g 2 —3) respectively, obtain a 95% confidence 
interval for the expected growth difference in the first week. 

Sol. According to our notation Wi = 10, ?i 2 —12, cri=2, <t 2 =3 
£i= 25 and 3; 2 =24. 

(Xi—X 2 ) — [Pi —p 2 ) _ Q 

/ 2 . X 1 / 2 ‘ ( ' ' 


Therefore from the normal tables we get a quantity 1.96 
such that 

Pf-1.96< <1.96 1=0.95- 


i.e., 



Ci —3; 2 )—1.96 



) l/2 

<Ml-^2<(^l-^2)+l‘ 96 
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A 95% confidence interval for i Sj 


/ / a i G \ 

^ i -*!- 1 - 96 V - « r +_ ^r • Si_ * s+1 - 96 ~j 

= ( — 1.097, 3.097) (8.35) 

Comments. It may be noticed that this 95% confidence 
interval is not a unique one. We could have found two values 
and <i such that 



(8.36) 


This would have led to a different interval estimate if t 0 
and t\ were different from —1.96 respectively. If CTl and a 2 are 
not given these ‘nuisance’ parameters can be avoided by taking 

the sample variances as estimates for g\ and <r* when the sample 

sizes are large. 


Ex. 8.4.2. Two independent random samples of sizes 10 and 
12 from the populations N(y. 1 , g) and N(y - 2 , a), have means 50 and 
45 and variance 16 and 25 respectively. Obtain a 90°/ o confidence 

interval for Pi~p 2 ’ 

Sol. According to our notation 


^1=10, w 2 = 12, £i=50, 2: 2 =45, 
si =1 6, si = 25,=0.05. 
E(X 1 -X 2 ) = p 1 - f i 2 


and Var (X a _5,)=a» (-+-) 

\ n ± ^ n 2 J 

and a 2 may be estimated by 

S 2 = + S 2 2 j j {ni+ni— 2) 

Hence f — ~ (Pi ~ M 2 ) (8.37) 

a/S 2 (—+ — ) 

V V n x ^ n 2 J 

i statistic with ni-{-n 2 — 2=20 degrees of freedom. 
We oJ 1 stu ^ e nt t table, corresponding to 20 degrees of freedom 
get a value 1.725, such that 
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IN 


- 1.726 


f f(t)dt= 0.05= I j 

1=0 1-725 




(8.38) 


where f(t) is the density function of a student t with 20 degrees of 
freedom. Therefore 

(^i—^ 2 )—(/*!—/*») 


-1.726 < 


\/ 6 ( m + n 2 ) 


<1.725 / =0.1>0 


(8.39) 


i.e.. 


P^(^i—J 2 )-1.725 fi2 ( ^ )</*!-/*2 <(£i-2 2 ) 

+ 1 - 725 VKJ+JIH 0 

P{3.483 </* 1 -/^ 2 <6.517}=0.90 (8.40) 

Hence a 90% confidence interval for fJ , 1 — fi 2 is (3.483, 6.517). 

Comments, \£n x -\-n 2 —2>30 instead of a students we 
may use a normal approximation. If the population variances are 
different for large sample sizes of n x and n 2 , the statistic 

(Xi— X 2 ) — (Pi—ix 2 ) 

172 


S 


n x 


s; 


(8.41) 


+ 


n 2 


is approximately a N(0, 1). Hence in such a case this approxima¬ 
tion may be used to set up confidence intervals. 

Exercises 

N(u „ 88 '„ I ° d ^ ndent T? 0m Samples of sizes 20 and 25 taken from 

2 ") anS 50 and 46 o™*™* 

8 . 9 . Independent random samples of 10 girls and 12 bovs of a certain 

ifX^T aT . erage I,Q ’l 104 “ d 103 ' with standard deviation. ifanS 
®. K ” r Assuming that the I.Q’s are distributed as N(,s. <,) and 

bSf interval estimate for tha ~ d diff - 

people of^a'certain r profes6ion U ba e two°cities ld 6hows m th eS ° f 8izeS - 50 and 70 
$100 and fifio -non °, c , 8 ' snows the average incomes as 

*» «» -peot.d 

[Hint. Use a normal approximation]. 

production proces^a-f^n 88 ^™?^ 0 ^ s ^ nws the average output of a 

standard de P viatfons 5 units an^ A and 35 Units ^ ™^od P B with 

of normality obtain a 99 <y into u j 1,ts re 3pectively. Under the assumption 

the output.VS“twS ietS*. ' f ° r the expected differen00 in 
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g.5. CONFIDENCE INTERVALS FOR PROPORTIONS 

Let us consider a binomial probability situation. Let n he 
the probability of a success in any trial and let * be the numbfr of 
successes observed in N trials. Then j>=* /N is the observed 
proportion of successes. 

But E(^)=E^^=pand Var (p)=j)(l_j,)/N (8.42) 


For large N, 


P~P 


V 


p{l-p) 


is approximately normally distri- 


N 


buted or 


P~P 


V 


23(1-2)) 


: N(0,1) 


(8.43) 


N 


Corresponding to any given a, z*/., can be found out from 
normal tables, such that. 


P -z 


ot/2< 


P-P 


V 




23(1-23) 

N 


= 1 —a 


(8.44) 


The inequality 


~ 2 ot/2^ 


P-P 


<2 M 


“ /2 

^pon simplification, may be transformed to 


(8.45) 


•+^-^V- ( V+r 


Z 2 


flt/» 


N + z 2 


a/2 




; +l2 2 -f z a ^ 2 a; ^ z a/2 (8.46) 

N + ** b/2 

gitV^*. Hence a 100(1 —a)% confidence interval for p is 


1 1 ) ±z «/2 V 


(N — a) 


(i.) / 

V 4 1 «/2 


N+z 


a/2 


(8.47) 
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When N is sufficiently large (p — p)l\/p(l — #)/N may 
assumed to be approximately N(0, 1). Then 




(8.48) 


That is P{p—z a / 2 ^(1 -p)/ N + 2 a/2 \Zfti 1 ~ #)/N)-1 -<* 

Hence a 100(1—a)% confidence interval for p is, 

£± 2 «/2 (8.49) 

If the total number of trials N is not large, corresponding to 
a given a we can find out numbers f 0 and ti from binomial pro* 
bability tables so that, 

P{*o<P<*i}=l-« (8.50) 

It is seen that transformation of equation (8.45) to (8.46) 
involves the problem of solving quadratic equations. Sometimes 
such a separation of the patameter under consideration poses 
greater difficulties. In such situations a graphical representation 
usually gives some ideas about the confidence intervals. 

Ex. 8.5.1. In a random sample of 100 articles , 10 are found 
defective. Obtain a 95% confidence interval for the true proportion 
of defectives in the population of such articles under consideration. 


Sol. The proportion p of defectives in the sample 

=0 1 

“100 

Here the sample size N is large and hence we can assume 
that the statistic 


P-P 




-P) 

N 


is approximately a N(0, 1). 


Therefore from normal probability tables, we obtain 1.9<> 
such that 

Pf-1.96 < ■ ?~ p , < 1.961=0.95 

1 /£(!—#) I 


{ } 

[j>-1.96 < p < p + 1.96 Jl<Lr-irj= 0.95 


1 . 6 . 
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JI nee a 95% confidence interval for p is 

'...» JB? 
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V 


, p + 1.96 


V'V) 


(o 


= (—0.086,0.286) (8.51) 

r mments. When N is not large we can obtain two values 
^(°by using a table of binomial probabilities, such that 

aU 1 P {t 0 < V < fi}=0.95. 

This often involves complications since we may not be able 
find out a statistic involving p with its distribution independent 
U In such a case we can adopt a general procedure. For 
additional reading in this line see reference [5] at the end of this 

chapter. 

Exercises 


812. A random sample of 40 articles of a particular type shows that 
fi of them do not meet quality specifications. Obtain a 99% interval esti¬ 
mate for the expected number of defective ones in a shipment of 10,000 such 
articles. 

8 13. A random sample of 100 seagull eggs collected from an island 
shows that'10 of them are spoiled or will not hatch. If there ar© 10,000 eggs on 
that island, obtain a 9 >% interval estimate for the expected number of chicks. 

8.14. Two random samples of sizes 100 each of a particular article 
from two production process ©5 show that 2% are defective by one process and 
3% are defective by the other process. Obtain a 90% interval estimate for 
the expected difference in the proportion of defective articles by the two 
processes. 

8.15. A survey conducted on two random samples of sizes 100 each 
shows that the survival rates from a disease, is 90% by drug A and 95% by 
drug B. Obtain a 99% interval estimate for the expected difference in the 
survival rates by the two drugs. 

8.6. CONFIDENCE INTERVAL FOR VARIANCE 


Let X 1( X 8 , ..., X„ be a random sample from a N(p, a) we 
shall construct 100(1—a)% confidence intervals for a 2 . 


n S 2 /a 2 ; X 

n —1 


(8.52) 


■where 


S 2 =X(X i -X) 2 /n and X 


n— 1 


18 a X 2 with u —1 degrees of freedom. From a X 2 table we can find 
°ut two values x and x such that 

1 —a/2 a/2 






= 1— a 
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This is illustrated in Fig. 8.5. 



Fig. 8.5. 


The double inequality 


2 _«•»* 2 

Xj <* a < X may be written as 


t.«,. 


X l -«/2 1 X 2 «/2 

W S 2 "tf 2 ^ M6 2 

7152 ^ WS 2 

“i—< CT < —2- 

z „ X. . 


(8.53) 


(8.54) 


a/2 1—a/2 

Hence a 100(1—a)% confidence interval for a 2 is, 

(2 2 \ 
ns 2 /x , ns 2 lx ) ,o **x 

V a/2 * 1—a/2 / ( 8 ‘ 65 ) 

or a 100(1 -a)% confid ence inter val for cr may be given as, 

(N/^C- (8.56, 

the income distribution thorp i* nJ? nc ? mes ™ $36. Assuming that 
95% inter JeJZe/orc a J(m, a) obtain a 

Sol. Here —20. ^=36, 1-«=0.95=>«/2 = 0.025 


wS 2 /a 2 : ^ 


w—i 


quantities 8.907 and ^sS^uc^that; 0 72,-1 = 19 d f -, we get the 
8.907 

32.852 ' 
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0 


t erval 


where / 


( 



is the density function of a chi-square 

V, 


with 19 d f. 


P{8.907<?ia 2 /a 2 <32.852}=0.95 

That is, P{ws 2 /32.852<a 2 <7i.s 2 /8.907}=0.95. 

Hence a 95% confidence interval for a 2 is, 

(ws 2 / 32.852, w$ 2 /8.907) = (21.92. 80.83). (8.58) 


Comments. From this confidence interval a 95% confidence 
interval for a may he obtained as, 

(V21.92 ; V'80T83y. (8.59) 

It may be noticed that the interval given above is not a 
unique one. For small sample sizes the x 2 distribution is not a 
symmetric one. So there is no guarantee that the central interval 
is the shortest one. 

Exercises 

8.16. A random sample of 20 bullets produced by machine shows a 
standard deviation of 0.2 mm. in the measurement of their diameters. Assum¬ 
ing that the diameter measurement is a N(h-, ct) obtain a 96% interval esti¬ 
mate of CT. 

8.17. A random sample of 10 housewives in a city shows an average 
weight of 135 lbs with a standard deviation of 5 lbs. Assuming normality 
for the weight measurements obtain a 99% interval estimate for the true 
variance a*. 


8.18. A random sample of 12 married teenage girls shows an average 
I.Q. of 90 with a standard deviation of 2. Assuming that the I.Q’s of such 
girls have a N(n, ct) obtain a ?9% interval estimate for 2 ct. 

8.19. The standard deviation of the marks obtained by a random 
sample of 20 freshmen in a particular university is 10. Assuming that the 
marks of freshness in this university are approximately a N(f-, ct), obtain a 
95% interval for 3o J . 

8.20. If a random sample of size n is taken from a N(ft, cr) then 
'=L2'(X i - X) a /(n—l)] 1 /* iB approximately normally distributed with mean a 

and with variance o J /i:n. Construct a 100(1 — a)% confidence interval for a by 
using this approximation. 


8.7. SUMMARY 

The following table on page 269 gives some of the confidence 
intervals for the parameters mentioned in the same table. These 
intervals are based on an observed random sample of size n in the 
case of a single population, and two independent random samp- 
es of sizes m and n 2 in the case of two populations, 


' Z a/2> *a/2, v, 

S, S', S",are defined 


2 

X «/2, v, 

by. 


2 

7 

1—a/2, v 


2€S 
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t ‘’ / 2 rf</=«/ 2 , f /(*)*=«/2 

it/ 2 , v 



I 


a/2, v 

g(x)dx=oc/2 = 


CO 


I 



g(x)dx 


n 

S 2 = 2 


t'=i 


(X,-X) 2 /ra,S*= 


n 


S (X t -X) 2 /(?i-l) 

t=l 


S 2 —^ wiS’ +w a Sj ^ ^(wi + w 2 —2), 

S i = 2 (Xi-xy/m, s 2 2 = 2 (Y f —Y) 2 /w 2 

*= 1 »=1 


respectively, where/(£) and < 7 ( 2 ;) are the density functions of a 
student t with v degrees of freedom and a X 2 with v degrees of 
freedom respectively. 
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UUNIIIJEJNCE REGIONS 

In the previous sections we have sten that we can establish 
confidence intervals for a parameter in a probability distribution, 
it we can get a statistic whose distribution is independent of the 
parameter. In other words we can estimate a parameter in terms 
°„ ln ^ erva ^ su °h that the interval will contain the true value 
o e parameter, with a probability of I — a, for any given a. In 
tms section we will consider the interval estimation of the para- 
me ers of a probability distribution when there is more than one 
parame er For example in a N (fx, <j) there are two parameters fx 
f n , on . aD °| 38erv ’ e d sample, if a region is constructed 

u< \ i rf ^ on cover the true diameter values, with a 

1 \o/ t ^ en can sa ^ ^^ a ^ proposed region ie a 

v a J/o confidence region for the parameters ^ and <r. Here also 
I'llnf - 00688 / 11 constructing a confidence region depends on the 
i lon °., a suitable statistic which contains the parameters 
100 narnm \ era ^ 1C mu k U *' w ^ ose distribution is independent of the 
tratid in Tr «« Th u e ide " . 0f a (!-«)% confidence region is illus- 
is the Rnn ^ 18 a re 8 i° n i n the parameter space (that 

In our pLm p nerate d by all possible values of the parameters. 

space iq ~ 00< ^ fl ^ 00, 0<tf<oo and hence the parameter 

* upper half plane. If there are three parameters 



For the Parameter / Statistic 100(1 — a) % confidence interval 


jjjtebval estimation 
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i (inn the parameter space, is a subset in a three dim en 
“S fe do .) the probability that (P, a* C is 1-a. 


PHP, °)eC}= l-« 


/ O Oa v 



Fig. 8.6 


Based on independent samples of size n if 1 00 v l — a)% confi. 
dence regions are constructed, in the long run, 100(1—a)% of 
these regions will cover the true parameter point (ft, a) These 
ideas may be extended to the case of any number of parameters 
and for any parent population. For a more thorough discussion of 
these topics and other related topics like tolerance limits, fiducial 
intervals, and Beyesian intervals, the reader is advised to see the 
Advanced Theory of Statistics, Vol. 2 by M.G. Kendall and A 
Stuart. More references are given at the end of this chapter 

8.9. CONTROL CHARTS 

In industrial production process it is often necessary to check 
the quality of the product in order to keep the process 'under 
control’. If pipes of a fixed inner diameter are produced, they 
may not be good if the diameter g jes below a limit or above a 
certain limit. Examination of each and e?ery item may not be 
possible if large number of items are produced in short intervals 
of time. In order to keep the quality of a product within some 
specified quality limits, industrial engineers use a control chart. 
Even though control limits, are different from confidence limits it 
may be easier for the reader to pick up the ideas now. So a note 
on control charts is given in this section. 

8.91, Control Charts for Means. This is a control chart 
based on the sample means. Suppose that a machine produces a 
metal rod of length/x°. If the length is distributed as a N(p 0 > 
and if we take a random sample ol size n then 

P{/* 0 —3|u 0 +3g/Vw}=0.997 


interval estimation m 

I denotes the sample mean. If random samples of size n 
W taken at regular intervals and if an x falls outside the interval 
&T6 __ 3 a/y/n, ft 0 +3ff/V w ) fh en fhe process may be called 'not under 
^ntrol*. Here Pq+SgI'V 71 an d P-o—3a/\/«, are called the upper 
and lower control limits. A chart is given in Fig. 8.7. 



Fig. 8.7. 


The sample means of samples of fixed size, taken at regular 
intervals of time are plotted in the diagram. If the points fall 
with in the control limits the process is ‘under control’. If a point 
falls outside any of the control limits the fault may be corrected 
by checking the process. Depending upon the nature of the 
production processes various control limits can be set up and the 
production process can be checked with the help of a control chart. 
The limits in the chart of Fig. 8.7 may be called the 3cr limits, 
since they are based on a deviation, equal to three times the- 
standard error of x. 


8.92. Control charts for proportions. If p is the sample 
proportion of a sample of size n and if p is the true proportion, in 
a binomial probability situation then E(p)=p and Var (p) 

~p[l—p)/n. So a control chart for proportions may be set up a& 
shown in Fig. 8.8. 



Fig. 8.8 


r 
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Thi i chart is made by plotting the sample proportions f> ,* 
regular time intervals. If a machine is known to produce ««/ 
defectives, then this percentage may be kept -under control-V° 
taking a random sample of size n at regular time intervals .3 
plotting the proportion of defectives f>. If p falls witl , it ' “ d 
control limits set up, the production is ‘under control’ Here „i 
the control limits may be set up according to the special nature 0 °f 
the production process. The upper and lower control limits in 
Fig. 8.8 are called the 3c limits because the limits are set up bv 

considering p±3y Var (j>). In many processes the lower control 
limit may not be of any interest to the manufacturer I„ th„i 
case only the upper limit is considered. Depending upon the 
nature of the production process one a, 2c, or 3c limits^ (either 
bot h upper and lower limits or only one limit alone) may be set 
up for checking the quality of goods produced under a particular 
production process The same ideas can be used for setting UD 
quality limits for differences of means, for differences of pronor 
turns or for variances etc. if desired. If the population is Normal 
and it a 3<r chart is made then we kno w that the probability of a 
sample mean falling outside the control limits is only approximate¬ 
ly ° 01 f +i mCe w} = °.99 approximately. 

Even if the population is not normal we know by Chebyshev’s 
inequality that the probability of an x falling outside the 3a 
control limits is less than 1/9 whatever may be the population. If 
y, and <7 are unknown, they can be replaced by appropriate 
estimates which can be obtained from the producer’s experience 
or by examining a few samples. 


Exercises 

8.21. Taking n # =5 and <r = 0.15 construct a 2cr and a 3a control charts 
for tho ineanu. Plot the following data and deck whether the process is 'out 
of control’ at any time. The sample means of the samples of size 9 taken at 
half an hour interval are given below. 4.87, 4.9,5, 5.1, 5.11, 5.13,5,4-9, 4.85, 
4.84, 4.87. 

8 22. Taking the true proportion as 0.10 construct a 3a control chart 
for the proportion of defectives in a production process and plot the follow¬ 
ing data. The number of defectives in a random sample of size 100 taken at 

one hour interval are 5,4,5,7,10,11,10,9,7,8,5,-1,2,0,3,5,5,7,8,11,12,14. Comment 
on the process. 
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CHAPTER 9 


POINT ESTIMATION 


9.0. Introduction. Ill the last chapter we considered the 
problem of gi ving an interval estimate for a parameter of a pro¬ 
bability distribution. In this chapter we will consider point esti¬ 
mates or scalar (single) quantities as estimates of the parameters. 
In day to day life we face many situations where we would like 
to get an estimate of a certain unknown quantity. If we want 

the average weight at a particular time of all teenagers in a 
particular country, we can find this out if we observe the weight 
of all the individuals in this population of teenagers. Another 
method is to select a representative sample and take the sample 
mean weight as an estimate for the average weight (population 
mean of the population of the weight measurements). If the 
weight measurement is distributed as a 1), the problem 

reduces to the estimation of in a N(>, 1) from an observed 
random sample from a N(p-, 1). If a manufacturer is interested in 
the average lifetime of electric bulbs manufactured by a particular 
process, he cannot test each and every bulb, then there won’t be 
any bulb left for sale. He is compelled to obtain an estimate of 
the average lifetime of electric bulbs by observing a sample (by 
testing a sample of bulbs). Cost considerations and other numer¬ 
ous factors will compel an experimenter to have an estimate of 
the parameters. If the lifetime X, of the bulbs is assumed to 

have an exponential distribution, the problem is to estimate the 
parameter E(X) in an exponential distribution. If scalar quanti¬ 
ties are given as estimates of the parameters, for example, the 
population mean is estimated by the sample mean or sample 
median, the population variance estimated by the sample variance 
«tc., such estimates are called point estimates. There are a 
number of commonly used methods of obtaining point estimates. 
Some of them are the method of moments, the method of maxi¬ 
mum likelihood, the method of least squares, the method of mini¬ 
mum chi-square. Minimax, Invariance and Bayes procedures etc. 

..® P^hod of moments and the method of maximum likelihood 
will be discussed in the following sections and the other methods 
™ be dealt with later. 


9.1. METHOD OF MOMENTS 


\y e motivation for this method is simple and straightforward. 

Wou ld like to estimate the population moments by the corres- 
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ponding sample moments. This is one of the oldest methods 
The sample moments are equated to the corresponding population 
moments and the parameters are estimated. That is, by using 
the equations. 


* * 

m,= P T r= 1, 2,... 


(9.1) 


' r / 

the parameters are estimated, where m r = £ x i jn and /i r = E(X r ) 

and x\.x n is an observed sample. Of course this method is 
applicable only if the population moments exist and they are 
representable in terms of the parameters. For a Cauchy distribu¬ 
tion the first moment does not exist and hence this method of 
estimation cannot be used there. 

Ex. 9.1.1. Estimate the parameters in (1) exponential distribu¬ 
tion with the parameter 6, ( 2) a N<p, o), by the method of moments , 
based on observed random samples of size n. 


r 

Sol. Let x n be an observed sample. m x —Sxijn—x , 


a 

jn. 

(1) Let/(*)=r (l/0)e-*/» 

1 0 elsewhere 


(9*2) 


E(X) = 0 and therefore m' =0 or 6=2, where ( a ) denotes 
the estimated value. ty 

(2) E(X)=/x, E(X 2 ) =n' , or \i< — p.' — a 2 . 

2 2 1 

The estimating equations are, 

m' = y => 
i i 

m 2 = K =* $2= m 2 ~ m ‘l =2 \ /»(Xt -lYtn 

(9.4) 

9.2. THE METHOD OF MAXIMUM LIKELIHOOD 

_ ^ Xl y / " , £ nh * an . obse rved random sample of size n from a 

popdation/fa;, 0). The joint probability function of X!,.... X„ (whose 
observed values are x n ) is f/ Xl , 6) f(x 2 , 6)...f (x 6) -~L (0) 

^i a th^ COn fi, id i red ii t0 i be a function of R A - Fisher who introduc- 
ed this method, called L, the likelihood function. If the parameters 

mctlrc ^iif re e 8 1 V m ? ted h y maximizing L with respect to the para- 
Onirdrm« called the method of maximum likelihood. 
P is divided on the logical basis of maximizing L with respect 
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to 


the 


parameters which are constants, even though they mav he 

‘ For a discussion of this topic, the reader may examine 

_ 4- A An H AT 4-V^ _ ^ m i • J - - 


unknown, rui reaaer may examine 

the references at the end of this chapter. This method fails if 
}% does not have a maximum, and if L (9) has a number of local 
maxima we may take the largest one among them. 

Ex. 9.2.1. Given a random sample x u ..., x n from a Nlfx, a) 
obtain the maximum likelihood estimates of /* and a. v ' ' 


Sol. f(x, d)=(2n a 2 )-* 

L=f(x l} d)f(x 2) 6)...f(x n , 6) ( 9 <5 ) 

= (2 tt a 2 ) H/2 e ~ z n i=l ^i~v-Yl^' ( 9 . 6 ) 


log L=-n log a-(n/2) log (2 tc)— 27’^ (*,—/x) 2 /2a 2 (9.7) 

—— log L — 0=> (X{ — ju)/a 2 = 0 

3/x 6 i = 1 V " 


and —— log L=0=> — nfa+Z(x t — /x)*/a 3 = 0 (9.8) 

That is, A =5 and 3=[^(a; 1 -—S)/w] 1/2 (9.9) 

It is easily seen that log L is a maximum at A and <3 2 . If A 
and S 2 maximize log L they maximize L also. Hence the maxi¬ 
mum likelihood estimates of /x and c? 2 are A and 5 2 respectively. 

Comments. It may be noticed that we need not always 
take the logarithm of L. The logarithm is taken only for con¬ 
venience. Obtaining maxima or minima by differentiation may not 
always be possible. In those cases we will maximize L by using 
some other methods. If 0 is the maximum likelihood estimate of 
9 and if (0) is a non-trivial function of 6 then 0 (0) is the maxi¬ 
mum likelihood estimate of <£(9). This is easily seen from the 
following results. 


3L __9L 
H (0) ~~ de 


I*w =° im P‘ ies that and i 


vanish together and hence the estimate of 4> ($) is (d). So by 
u f ln g i^is property the maximum likelihood estimate of a 2 in the 
above example is P=£{x t -Z)*j n . 

,, 9.2.2. Obtain the maximum likelihood estimate of 0 in 

c following distribution. 



1/6 for 0<x<9 
0 elsewhere. 
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Sol. Let x 1} ..., x n be an observed random sample from 
/(*» 0 )> 

L=1 /0 n and 0 <a: 1 ,.... x n <d ( 9 .i 0 ) 

Differentiation techniques are not of much help here. If § 
is the^required estimate for 9, then d maximizes L, which implies 
that 0 minimizes 0 ", which implies that 0 minimizes 0 . Based on 
the observations, the smallest possible value which may be assign¬ 
ed to 0, for which L is a maximum, is the largest of the observa- 
tions, since Oca*,..., x n <9. Hence 0 =»max (* 1 ,..., x n ) (the largest 
.01 the observations. 

Comments. If we accept the statement that L is a func¬ 
tion. of the parameters, the maximum likelihood procedure will be 
the simple mathematical problem of maximizing L with respect 
to the parameters. Even though the examples worked out are 
for the continuous distributions, we can apply the method to dis¬ 
crete as well as to continuous distributions. Here if we use the 
method of moments, we get 6 to be 2 2. 


( ■ Exercises ' ■ 

9A * Inspection of a random sample of 20 oranges from a large shin, 
paent of oranges showed that 2 are spoiled. Obtain a point estimate of the 
proportion of spoiled ones in the shipment, by the method of moments and 
py the method of maximum likelihood. 


, , 9 - 2 - An switch board received 2 and 3 phone calls in two ran- 

doraly selected 5 minute intervals, respectively. Assuming a Poisson distri¬ 
bution for the number of calls in a ten minute interval, time being measured 
in 5 minute units, obtain a point estimate for the expected number of calls in 
a 10 minute interval. 


9.3. If x t , x n is a random sample from a Gamma distribution with 
the parameters a and p, obtain the maximum 4 likelihood estimates of 2a and 

9.4. The income in excess of $2,000 of the people in a city is distri¬ 
buted exponentially. Three people, selected at random from this city have 
incomes $3000, $.000 and $10,000 respectively. Obtain a point estimate of 
the expected income of a person in this city, by the method of maximum 
likelihood. 


After the appointment of a new salesgirl the sales in a shop has 

increased. Four randomly selected days show an increase of $100 $300 $400 
and $600 respectively. Assuming that the increase in sales has an exponen¬ 
tial distribution and the provincial tax is 5% of the sales, obtain a point 

estimate of the expected increase per day of the provincial tax returns from 
this shop. 

. a ®ktain the point estimates of the parameters in the distribution 

j(x, 0) = l/(p — ot) for a<#<(3, a>0 and is zero elsewhere, by the method of 
maximum likelihood. (Assume that a random sample is given). 

. If/(^) = (9 + l) for 0<a;<l, 0>O and is zero elsewhere. Obtain 

a point estimate of 0 by the method of moments and by the method of maxi¬ 
mum likelihood. (Assume that a random sample of size n is given). 
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PROPERTIES OF ESTIMATORS 


It is seen that the different estimation nr 

above can lead to different estimates to the same parameter 
we estimate the same parameter using two diff^ P aratneter - H 
ray get different estimates. Owing Z 

methods diseassed, one may argae that an observed ™h,» “ 

statistic may be taken as an estimate for a narai.r f ‘7 
general, any statistic if it is used to estimate a DaraMte!' In 
bs called an estimator In the following sections we wil? f£m<Ze 
some desirable properties of estimators, so that we will beTble to 
say that one estimate is batter than another estimate or to show 
that one particular estimate is the best of all possible estimate! 
haying some general properties. Since there is no unique set !f 
criteria by which one can select anestimator which is the best 
among all possible estimators, we will discuss some geoeral criteria 
which are desirable under some conditions. The bhsis of prefer- 
ring one estimator to another depends upon the purpose for which 
the estimator is used. In a particular case one estimator may be 
preferab e to another estimator satisfying more properties. So 
m the following sections a number of criteria will be discussed and 
the desirable ones for a particular situation oan be pioked up by 
studying the conditions and the purpose for which the estimate! 
is used. In other words the selection of a particular estimator is 
left to the experimenter who wants to use an estimator and who 
knows his experimental conditions. 


9.3. UNBIASEDNESS 

A 

If 0 is an estimator of 9 such that E(0) =9 then 6 
is called an unbiassed estimator and the value assumed by 
9 is called an unbiased estimate for 6. (Whenever there 
is no confusion we will use 9 for the estimator as well as for 

the estimate). We know that E(X)=/* and hence the sample mean 
is an unbiased estimator of the population mean whatever may be 
the population. If x lf ..., x H is an observed sample then 

(3i+... -\-x n )/n may be considered to be a value assumed by XTand 
hence £ is called^ an unbiased estimate of ju. But we know that 

E(S*)=E S (X, -X) 2 /n^<7*. Hence the sample variance is not an 
unbiased estimator for the population variance. But E I7(X ( —X) a / 

(»-—l)-= CT 2 an( l therefore E (Xj — X) 2 /(ft — 1) is an unbiased esti¬ 
mator for a 1 , whatever may be the population. Here the bias in 
an estimator S* is easily removed by multiplying the estimator by 
a p C ® a3tan *- This procedure is not applicable if the expected value 
o he estimator is a complicated function of the parameter. In 
iV* case some general methods for removing the bias are avail- 
is 6 fl ( S,ee re ^ erence [^] eQ d °I this chapter). Unbiasedness 

the desira ^ e property. If we go on taking random samples of 
same sue, we would like our estimator to assume the para- 

oth er> ° n avera 8 e > in the long run. However sometimes from 
er consideration#, unbiasedness may not be desirable. If there 
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exists. a statistic T such that E(T )=g (*) «hen g{9) is said to be 
an estimable parametric function. 

Ex 9 3.1. Show that in a binomial probability situation th e 
estimate of the probability of a success p, by the method of maximum 
likelihood, is unbiased for p. 

Sol. If X n denote the independent stochastic vari- 

ables taking the values 1 with probability p and 0 with probabi- 
lity (1— p) and other values with zero probabilities, then the 
number of successes X in N independent trials is, 

X=Xi-f ••• -("Xn 

The maximum likelihood estimate of p is easily seen to be, 
p=a;/N=observed proportion of successes. (9-11) 

Therefore, the estimator is X/N and E(X/N) 

= (1/N)E(X)=N^/N=p. (9.12) 

Hence X/N is an unbiased estimator of p or p=x/N is an un¬ 
biased estimate of p. 

9.4. CONSISTENCY 

n a 

If 9 1 and 0 2 are two unbiased estimates of 9, we will need 
further criteria in order to call one better than the other. Another 
desirable property of an estimator is called consistency or sto¬ 
chastic convergence. If the probability of 9 tendingto 9 approaches 
one when n tends to infinity, 9 is called a consistent estimator 
of 6. 


That is, 

P{0->-0}->l as n-*oo (9.13) 

or corresponding to any given €> 0, however small it may be, 
there exists a §>0 such that, 

P{ J §-e J >e}<$ (9.14) 

for n^ some specified value n g. This is the same as saying that 
9 converges to 9 in probability. 

For example if we want to estimate the population iman of 
a finite population of size N and if we take a random sample of 
size n, the sample mean coincides with the population mean when 
w-JNJ. When the sample size n approaches N we would expect 
the sample mean x to approach the population mean /*. In 
general, in an infinite population when n->oo we would like to 
have the probability of our estimate coinciding with the parameter 
approxima e y equal to one. It may be noticed that 'consistency’ 
is a large sample concept or here we are dealing with the property 
ot the estimator when n is very large. 

fection f 3 . 5 ) aSt ° ChaStiC Variable ’ Chebyshev’s inequality (see 

P{ | *-E (l) | >&}<Var (t)/k*. (9.15) 

Therefore we can give the following theorem. 
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Theorem 9.1. If 8 is an unbiased estimator of 9 anrl if 
Var (0)->O and n^oo, then 9 is consistant for 9. 1 f 

Proof. By using the inequality (9.15) 

P{ 1 § - e I >*}< Var [ 0 )lk* where k is any arbitrary nositiv* 
constant. But Var (0)-v ') as n^oo, Therefore when P w^.oo 
?{\ 9 - 6 \>k} ^0 or 6 converges to 9 in probability. (9 JJj 

K 3 is a consistent estimator of 9 , n §/(n+l)isalso consistent 
for 0 A number of consistent estimators may be constructed It 
may be noticed that a consistent estimator need not be unbiased 
For example if 9 is a consistent and an unbiased estimator of 9 
then n9l[n- 1-2) is not unbiased but is a consistent estimator of 0. 

Ex. 9.4.1. Show that the sample mean is an unbiased and 
consistent estimator of the population mean of any population 
having a Jimte variance. 

Sol. E(X)=(l/»)EfX 1+ ...+X„) 

=ll/«)[E(X,) + ...+EX„)] 

= 0! n ) (p -K-. +/a) (implies unbiasedness) 

( 9 -17) 

Var (X)=cr 2 /n where u 3 is the population variance (see Ex 
5.6.1.) ^ (918 j 

Var (Xj^O as n-*oo. Therefore by theorem 9.1, X is a con- 
sistent estimator of 


9.5. RELATIVE EFFICIENCY 

If 9 1 and 9 2 are unbiased and consistent estimators of 9 , 
yje need more criteria in order to select the better one. If Var 
(0i)>Var (0j). we would prefer the one which has smaller disper¬ 
sion and we will choose 0 2 to 0i. Since Var (0 X ) and Var (0 2 ) are 
measures of dispersion of 9i and 0 2 from 9 respectively, we will 
base our next criterion on the dispersions of the estimators for 
the parameter. The relative efficiency of 9i with respect to 0 2 is 
defined as 


e=E(0 2 -0) 2 /E(0 1 -0) 2 (9.19) 

where and 0 2 are two estimators ^of 9 and E denotes ‘mathe¬ 
matical expectation’. If E(0 1 )=0=E(0 2 ) then 

e=Var (0 2 )/Var (0i) . (9.20) 


If e>l then 0 X is more efficient than 0 2 . If 9 is an estimator of 
* suc h that E(0)=0 and 9 has a variance smaller than that of any 
other unbiased estimator of 9 then 9 is called a ‘minimum variance 
unbiased estimator’. If e>l when the sample size tends to 
infinity then 0 X can be called ‘asymptotically more efficient’ than 
Again asymptotic efficiency is a large sample concept. For 
cxam ple the sample mean and the sample median are unbiased 
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estimators of the population mean in a normal population 
But Var (X) = a 2 /» and Var (m)= tc o 2 /2w (see section 6.4), where 

m is the sample median. Since Var (X)<Var (m), X is more 
efficient than m. We will state the following theorem without 
proof. 

Theorem 9.2. If 0 is an unbiased estimator of 0 and if 

0 A 

-p- log L = k{0 — b), where Jc is independent of the sample observa. 

tions, but may be a function of 0, then 0 is the minimum variance 
unbiased estimator of 0, where L is the likelihood function. 

Ex. 9.5.1. Show that the sample mean is the minimum, variance 
unbiased estimator of p in N(/*, cr). 

Sol. Let x n be an observed sample from a N(/k, a ) 

L=(27ro 2 )" n / 2 e~ *=1 (Xi — /i) 2 /2a 2 . 
log L=— n log o— (»/2)log (2rc)—27*^ (a;,—/i)2/2a 8 

g 

— log L=^ =1 (^< H')/g z = n(x ^)/<7 8 (9.21) 

g 

—- log L =k(x—[i) where &«=n/cr 2 which is indepen 


That is. 


dent of x u ..., x n . Also we know that E(X) = /a. Hence X is the 
minimum variance unbiased estimator of /* in a N(^, cr). 

Comments. We know that Var (X) =a 2 /n. But X is the 
lmmum variance estimator and hence any unbiased estimator of 

M has a lar S er variance. It may be noticed that l/jfc=Var(X) 

= a2 / W * „ ^ general if log L = & [0 — 6 ) where E(0)=0 thenl/fe 
— Var (0). 

9.6. SUFFICIENCY 

cienp/ n °i he f P r °V ert y of an estimator is suffi- 

tion T ' \ ^at^stic 0 is called sufficient for 0 if the condi- 

dent' nf'fl* tw’ 110 ,, ° f sam Pi° ™l«e3, given 0, is indepen- 
v v i* , a Is ’ x n/6) is independent of 0, where 

condiVirTr,.! 6 "?*! . rando “ aam plo under consideration and if the 

6 w '°“t ‘ 3lstr '^ u ‘ 10 P of X,.X n given g is independent of 

the semnie ^ contains all relevant information in 

statist^ J*.® parameter 0. We may also define a sufficient 

about the r*a 8 3 . ls ^ c which contains all relevant information 
L— f/ r er 111 /l le sai uple. But the likelihood function 

can hp"'ffn+y 1 MO). If the likelihood function L 

and 0 and thp. m ^° ^ unc ^ ons where one is a function of 0 
All these thren Hpfi 6 v- 1S mc * e pendent °f 0 then 0 is sufficient for 6. 
statistic Snitl / ? ns are one the same. Hence a sufficient 
P a reduction in data. Instead of considering ah 
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mule values we need consider only a sufficient statistic as far 

® a ‘information’ about the parameter is concerned, 
as me 

Ex. 9.6.1. In a 1) show that the sample mean is a 

efficient estimator of p. 

Sol. L=(2rc)-»/ a « _i: i=l(*(-f‘) 2 /2 


_2* w (x x) 2 l2 e—— p-) 2 /2 

=(2rc) _n / 2 e ' L)/Ze 


(9.22) 


But (27 z)' n l 2 e” {Xi—x) 2 /2 is independent of / x and 

e _n(^-ii) a / 2 is a function of x and p.. Hence X is sufficient for 
H since L is factorized into two functions where one is a function 
of the estimate and the parameter and the other is indepen¬ 
dent of the parameter. 

This can also be demonstrated by showing that the condi¬ 
tional distribution of X!,...,X„ given X is independent of [x. The 
distribution of the sample mean of a random sample of size n 
from a N(/a, 1) is 


/ a p) = («/2n)i/2« 


(9.23) 


Therefore f x (x ls ...x n |#)=L// 2 (x) = (2nn)-(n-l)i2 e~ 1x )*/2 
which is independent of 

_ Comments. From this example it may be noticed that if 

X is sufficient for / fx, 2X is also sufficient for /x, since L can be 
factorized into fj and / 2 where / 2 is a function of 23: and ju, and fi is 
independent of [x. In general if 9 is sufficient for 9 then any one- 
to-one function of § is also sufficient for 9. L=/i {9, 9) / 2 where 
h is independent of d. If L is maximized with respect to 6 it is 
equivalent to maximizing f x {9, 6) since / a is independent of 9, 
ence the maximum likelihood estimate of 9 will be a function of 
’ (th&t is, a function of a sufficient estimate for 6). So we may 
state the following theorem. 

then f T heorem 9*3- If a single sufficient statistic 6 for 9 exists, 
e maximum likelihood estimate of 9 will be a function of 9. 

in J°int Sufficiency. If there are Jc parameters 9i,...,9 k 

a distribution and if there exist/statistics t lt ..., t s such that, 

w here I ^ ,== /i (^i>*..» 9 k )f 2 (9.25) 

and the ^ ^kelihood function, /j is a function of the statistics 
h,.„, t P aiar neters and / a is independent of the parameters, then 

not be* en G i Sa ^ t0 i 0; * nt ly sufficient for 9i,...,V k . Since s need 
4 li ai to Jc, joint sufficiency need not imply that ti is 
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sufficient for 9 x , t 2 is sufficient for d 2 etc. Even if 1c=s joint suffi. 
ciency need not imply individual sufficiency. If there exists a 
minimum number r of statistics ti,..., t r such that they are jointly 
sufficient for 6 X ,...,0* then is called a minimal set of sufficient 

statistics for 9 lt ...,9 k . Since there can exist a number of sets of 
sufficient statistics for the parameters, ‘smallest 5 is used in the 
sense that this minimal set is a function of all other sets of 
sufficient statistics. In other words, further reduction of data is 
not possible without losing sufficiency. Incidently it may be 
noted that the sample itself forms a set of sufficient statistics for 
the parameters, in any case. 

Ex. 9.61.1. Show that the sample mean and the sample vari¬ 
ance are jointly sufficient for p and <r 2 in a cr). 

Sol. L=(2mr 2 )—w/2 a /2a 2 

= (2jra 2 ) — nJ2 e -2(Xi-W)*l2o*-n (^-M*/ 2 * 2 (9.26) 

=fi(s 2 , S, a 2 , (9.27) 

where y 2 — (2 tc)— w/2 and f x is a function of y>, a 2 , x and s 2 . 

Hence the result. 

Comments. f 2 need not be taken as [l/(27r) n / 2 ]. / 2 may be 
arbitrarily fixed according to the definition. The only restriction 
is that/i should be a function of s 2 , x, p and a 2 and / 2 should not 
contain p and a 2 . 

As sufficient statistics and maximum likelihood estimators 
are very useful in statistical analysis, we will state a few theorems 
without proofs, for the information of the reader. In the follow¬ 
ing discussion we consider only a single parameter case and the 
parameter will be denoted by 6, where B e Q (parameter space). 

Theorem 9.4. If lo gf(x, B) is differentiable with respect 
to 9 in an interval containing the true 9 where f(x, 9) is the proba¬ 
bility function under consideration, then the maximum likelihood 
estimator is consistent for 9. 

Theorem 9.5. Under some general regularity conditions 
which will be stated below, the maximum likelihood estimator is 
asymptotically (that is, when the sample size «,->-oo) more efficient, 
sufficient and normally distributed. 

Regularity conditions. These are some general and 
reasonable conditions on the probability function f{x, 6) under 

consideration. Let, 9) =/'(*, 6), ~ f(x, 9)=f" (x, 9) and 

let E denote ‘mathematical expectation 5 . 

Condition 1. The derivatives /'( x, 6), /" {x, 9) and/'" (x,6) 
exist for almost all values of x in an interval I of 9 containing the 
true value of 6. 
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Condition 2. At the true value of 6, let, E[ f'(X 6)lf(X flvi — 
, 0 . E U" lX, *)/(*, *)] and E[(/'(X, 6) )«//(X, 6)]>0. m ’ ,J ~ 


Condition 3. For every d in I, — 3 log/(a, 0) | <M(a) and 

E M(X) <k where k is independent of 0. 

9.7. COMPLETENESS 

This is basically a property of families of probability 
functions. This was introduced in section 3.6. But here 
we will consider the ideas of complete statistics or complete 
•estimators and complete sufficient estimators. In section 9.3 
we discussed the concept of ‘unbiasedness of an estimator. If 
Ti and T a are two unbiased estimators for a function h(6) of b 
then ETi=A(0) -ETjj and E(Tj—T 2 )=0. For any two statistics 
Ti and T a , if E(Ti T 2 )=0 implies Ti—T 2 = 0 almost everywhere 
then Ti T 2 =0 is the only estimator for zero. This induces a 
uniqueness for the unbiased estimator for h(6). Incidently if a 
statistic T exists such that E(T) = h{6) then h(d) is said to be an 
estimatable parametric function. If for any statistic T with pro¬ 
bability function f(t, 6) where 6 e Q, (parameter space), E 0(T) = O 
for all 6 e Q, implies that g(t) = 0 almost everywhere then T is 
called a complete statistic, where gr(T) is a real function of T and 
E denotes 'mathematical expectation’. If T is sufficient also then 
T is a complete sufficient statistic. 


Ex. 9.7.1. Show that the. sample mean oj a random sample 
from a N(p, 1) is a complete svfficient estimator for /a. 

Sol. Sufficiency of the sample mean X was seen in Ex. 9.6.1. 
X : N(^, 1 /\/n) 

/(5;) = (u/2t 7) 1 /2 e —"(2-f-)72 

— oo<a<oo, and p e Q. 
where Q ——co<?/<oo}. 

Let 0(X) be a function of X and let E ^(X) =0. 


00 

{nfiny/^gix) e ~ n ^~^ 2 l 2 dx=0 

— 00 


(9.29) 


Ls) “I 2 / 2 dx=0 


(9.29) 


=> j ^(S) e“ nx ' 2/2 e n[LX = 0 (since e n ^0) (9.30) 

- 00 

t =* g(T) e~ n ^ 2 /2 = 0 (This is obtained by taking a 

^ pi ace transform. See problems 4.51 and 4.52) (9.31) 
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=> 0 f( 2 ) = O (since e n ^*/ 2 7 ^ 0 ) (9.32) 

Comments. A reader who is not familiar with Laplace and 
Fourier transforms may omit this section. Since many proofs of 
completeness involve complicated transformations this section will 
not be elaborated. 

9.8. INVARIANCE 

This is a desirable property for the estimators. A rigorous 
definition and complete explanation of ‘Invariance’ involves 
Mathematics beyond the pre-requisite of this book. So 
only the basic ideas are introduced through some examples 
in this section. Consider the following problem of estimation. 
A population mean /a is estimated by using an observed 
random sample xi,..., x n . A number of estimates can be given for 
fi. Let T (ar,..., x n ) be an estimate for /*. Suppose that the 
observations are measurements in feet and we would like to have 
the estimate in inches. The new observations are I2xi, ..., I2x n 
in inches and the estimate is T (l 2 ^,..., 12x n ). A very desirable 
property of the estimate in this problem is 

T (12* lf 12a; n )~ 12T (a*,..., x n ), 

since /* feet = 12^ inches. Consider two estimators Ti(Xi,..., X„) 

= UXi/n and T a (Xi,..., X„) =XX* jn for [m Ti satisfies the in? 

variance requirement since T (aX L ..., aX n ) =a T (X lt ..., X B ) for 
every scalar a. But T a (aX 1; aX n )^aT 2 (X 1> .... X n ) and hence 

T 2 does not satisfy the invariance requirement with respect to a 
scale transformation. If an estimator T(Xi,..., X„) for ^ is such 
that T(Xi+c,..., X n + c)=c -f-T(X 1 ,..., X n ) for all real c then T, as 
an estimator for fi, satisfies the invariance requirement with 
respect to a translation or a change in the location. It can be 
seen that the sample mean, as an estimater for the population 
mean, satisfies the invariance requirements with respect to a scale 
as well as location transformation. In general we can define the 
invariance property of an estimator for a parametric funotion, with 
respect to some general transformations satisfying some condi¬ 
tions. This also leads to a method of estimation called the 
‘invariance method’ by which one can get estimators satisfying 
the invariance properties. This will not be discussed here. In 
the invariance problems, even though the estimates from the 
original observations and from the transformed observations may 
be different, the structure of the estimation procedure remains the 
same, in the sense that the family of distributions remains the 
same. This is why the procedure is called the ‘invariance 
method’. 

Exercises 

9.8. Obtain an unbiased estimator of p a -\-2 whore p is the parameter 
of the binomial distribution. 



JOINT ESTIMATION 

(1 - P) 

0 elaewhoro. 
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N-a 


for « = 0, 1N ; 0<^<1 
N is known 


[Hint : Obtain the second factorial moment]. 

9.9. If x n ii. a random sample from a Poisson distribution with 

arameter X, show that X is an unbiased and minimum variance estimator 
It 

of X. 

9.10. IfX is the sample mean of a random sample from a N(a, 1) 
that 3X+2 is a sufficient statistic for ja. 

9.11. If X and S a are the mean and variance of a random sample of 
size n from a N((a, a) show that 2X and 3S* are jointly sufficient?- for p. and a*. 

9.12. If 0 is a consistent estimator of 0 and if N is the sample size, 
jhow that (N— a) 0/(N— b) is also consistent for 0, where a and b are 
oonstaats. 

9.13. In a Cauchy distribution, 

f{x, 0) = l/rc {l + (®—0/} # —oo <a?< co 

show that the sample mean is not a consistent estimator of 9. 

914. Obtain the minimum variance unbiased estimator of p, if it 
•exists, in a # binomial distribution. 


A*. 6)= f( f) 


_ x ‘ 

P m (1 —P) for ic=0, 1,..., N ; 0<p<l, 


0 elsewhere. 


N is known, 


9.15. If X x , X a , X, X 4 is a random sample of size 4 from a Poisson 
distribution with parameter X, show that, § 1 =(X 1 +X B +X s +X 4 )/4 and 

^j=(2X 1 +3X 8 )/5, are both unbiased. Which one is relatively more efficient ? 
Are they sufficient for X ? 

Show that the sample mean is an unbiased and consistent esti¬ 
mator of 0-f £ for the following distribution. 


V. 


/(a?)= ^ l for 0 <cc< 04 -l 
elsewhere. 


: s i. i k * 8 the likelihood function when a random sample of size n 

akon from a population f(x, 0) having a parameter 0, then 

ia , ao . I = B (i IogL )’ 

tliat me ^ me8 ca ^ e< ^ amount of information about 6, in the sample. Show 

I=E (4 ,0 ®L)’— *(-£ r l°gL ) . 

-» ®[-^- log/<*>]' 

(Assume the regularity conditions mentioned in section 9.6), 
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918. Under the assumptions in problem 9.17 and using the re*ult. 

[Cov (X, Y)] 2 <Var (X). Var (Y), show that, 

Var (8)> 1/1 

where I is defined in problem 9.17 and 9 is an unbiased estimator of 6 in ^ 
p„pulatiin/<*, 6). This inequality is called the Cramer-Kao inequality,. 
More general forms of the inequality can be given. 


9.19. If E (e)=0+6(9) where 6(0) is a function of 0 and if 

6'(0) = -4- &( 9 b ahow that 

Var (O)>[l + &'(0)]*/I 


9.20. If E(O)=0 


and -4- log L =k (0—0) show that 0 is the minimum, 
30 


variance unbiased estimator of 0 (see theorem 9.2). 

9.21. Show that the statistic Xj^+Xj is complete and sufficient for 0,. 
where X lf X 8 is a random sample of size 2 from a population defined as- 

follows. 


/(1)=0,/(O) = 1—0 and/(*)=0 elsewhere, where O<0<1. 

9.22. Show that the sample mean, as an estimator for the population 

9 

mean ji, under the transformation x-+y<=ax-\-b, a> 0 , — oo<6<oo, (that- 

is whenever there is an observation x we consider a new quantity y~ax+b 
and this transformation is denoted by g'j, satisfies the invariance require¬ 
ments. 


9.23. Give an estimator T for the parameter 0 of an exponential dis¬ 
tribution, which is not (1) unbiased, (2) sufficient, (3) complete, (4) satisfying 
the invariance requirements under a change in the scale. 
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CHAPTER 10 


TEST OF HYPOTHESES 

10.0. Introduction. In the last two chapters we discussed 
some problems of statistical inference, namely, estimation of para¬ 
meters and setting up of confidence intervals. In this chapter we 
shall consider the problem of testing statistical hypotheses. This 
remarkable aspect of statistical theory had led some people to 
claim that anything and everything can be proved by statistics. 
By using statistical methods we do not prove anything but these 
methods help us to make decisions in situations where there is a 
lack of certainty. There are many practical situations where we 
would like to take a decision for further action. There are other 
problems where we would like to determine whether some specific 
claims are acceptable or not. Suppose that we want to test tho 
following claims. 

1 . A particular toothpaste reduces cavities by 46%. 

2 . A particular drug raises the survival rate from a disease 
to 95%. 

3. The appointment of a new salesgirl increases average 
sales by $200 a day. 

4. A bird watcher claims that during the spring season on 
the average, birds lay more eggs in Quebec shan in Ontario. 

5. Detergent A is more powerful than detergent B. 

6 . The fertilizers Fi, F a , F 3 and F 4 are equally effective as. 
far as the yield of a particular variety of corn is concerned. 

7. Detergent A out cleans all other detergents. 

These are a few of the many varieties of problems whose 
solutions demand the help of a statistician. The first problem is 
testing the hypothesis that a binomal probability is 0.46. The 
second problem may also be considered to be a binomial probabi- 
l ty situation. If the increase in sales in problem 3 is assumed to 
e normally distributed with mean and with a known variance 
1 e . n ^ * s a problem of testing the hypothesis /a = /a 0 =200. Simi- 
e r y P robJ em 4 may be transformed to a problem of testing the 
theskf- the means in populations. A statistical hypo- 
met S 0t thlS nature i s on ly a restriction on the estimable para- 
erB (parameters for which unbiased estimators exist) of a 
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probability distribution. Some non-parametric hypotheses (which 
.are not restrictions on estimable parameters) will be considered 
in the last chapter. In problem 1 when the hypothesis that the 
true proportion is 0.46 is tested it is tested against an alternative 
that either the true proportion is less than 0.46 or greater than 
0.46 or not equal to 0.46. So a statistical hypothesis and the 
.alternative hypothesis may be formulated as follows. 

Ho i 9 = 0 o 

Hi : 0=0i (10.1) 

-where H 0 (we may call it the null hypothesis) is the hypothesis to 
Jbe tested against the alternative H 4 . 

10.1. SIMPLE AND COMPOSITE HYPOTHESES 

If a null hypotheses completely specifies a distribution (that 
4s, the functional form as well as the parameters) then it is called 
a, simple hypothesis ; otherwise it is called composite. For example 
if we want to test the hypothesis H 0 : ^ = 5 in a normal popula¬ 
tion N(/x, 1) then H 0 is a simple hypothesis. If the alternative is 
Hi : ^=10 then the alternative is also simple. If the alternative 
.is /a<5 then there are a number of possible values for /a in Hi and 
hence the alternative is composite. If H 0 : /a >5 then H 0 is com¬ 
posite. If H 0 is /a = 5 in a N (/z, ct) where <r is unknown, again H 0 
is composite since H 0 does not completely determine the popu¬ 
lation. 

H» : 0=0» The null hypothesis is simple and the alternative 

Hi : 0=0i is also simple. (10.2) 

H 0 : 0=0# H 0 simple and Hi composite and one sided. 

H x : 0<0 O (10.3) 

H 0 : 0=0o Hq simple and Hi composite and one sided 
Hi : 0 > 0 O (10 4) 

H 0 : 0 = 0© H 0 simple and Hi composite and two sided 

Hi : 0^0o ( 10 - 5 ) 

-where 0 O and 0i are specific values of 0 and 0 denotes the para¬ 

meter in a given population. 

10.2. TYPE I AND TYPE II ERRORS 

When a hypothesis H 0 is tested agamst an alternative Hi 
usually there can arise one of the two types of errors, namely to 
reject H 0 when H 0 is true and to accept H 0 when Hi is true These 
are called the type I and type II errors respectively. They are 
illustrated in the following table. Here we will assume, f° r tne 
time being, that rejection of H« is equivalent to acceptance ot Hi 
.and vice versa. For example if H 0 : /* = 20 in a N(/a, 1) is rejected 
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in favour of the alternative H, • u —qn +u 

accepted. 1 ' M ~ 30 then automatically H a is 


Accept H 0 
(reject H x ) 


Beject H 0 
(accept H x ) 


Ho is true 
(Hji is not true) 


Correct 

decision 


Type I 
error 


H 0 is not true 
(H x is true) 

Type II 
error 


Correct 

decision 


H * *! epe , nd °" *-» 
cedure we would like to control the tvne T partlc " lar testing pro- 

error. The probabilities of eommHtW tbV7" “ T th ", type 11 

errors are called the sizes of the two errofs and^ H 77 11 
and (3 respectively. s anc * are denoted by a 

Type I error size a 

Type II error size p 

y is 4000 k.w. or more reject # b otherwise accept 

Sol. The distribution is • ' 

\ f \ 

4 ^"1 . 1 • | 

f(x, 6) = V — c - */° for »>0, 0>O 


L o 


v. 0 elsewhere 

According to the test criterion, JJ 0 is rejected if 4000 
a ==probability of rejecting H b when H 0 is true 


when ^—1000 


«JU 

rj ™ 


1 —z/1000 -4000/1000 

Iboo e ^=e 


= e -4 
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S=probability of accepting H 0 when Ha is true 
_ ,, when 0=2000. 


» »» 


But Hq is accepted when x<4000 and hence 


4000 


= f — 

J 2000 


-X12000 T , -4000/2000 

e dx=l—e =1—e- 2 


The two probabilities are illustrated in Fig. 10.1. 



Fig. 10.1. 

Ex. 10.2.2. A coin is thrown 10 times. . Suppose that the 
hypothesis H 0 : p=lj2 is rejected in favour of the alternative Hi : p 
— 2[3 if 8 or more independent trials give heads, where p denotes the 
probability of getting a head in any trial. Determine the sizes of 
type I and type II errors. 

Sol. This is clearly a binomial probability situation. 

^a=probability of rejecting H 0 when H 0 is true 
=probability of rejecting H 0 when p = \J2 

But H 0 is rejected when 8 or more trials give heads. 

£ “ = ( 1 8 ) (1/2,8<1/2,!+ (» )(W{W + (J% 1 /2)-°(l/2)» 

=56/2 10 . 

P=probability of accepting H 0 when H x is true 
^probability of accepting H 0 when p—2/ 3. 

H„ is accepted when the number of heads is less than 8 

if=(7)(Wl/3)'°+( 1 I # )(2/3)i(l/3)» +... 

+( 1 7 ° )(2/3) 7 (l/3) 3 
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(\8 )( 2 / 3 ) 8 (l/3) a + ^ g ° j(2/3)»(l/3)» 


+ (l0 )(2/3)«(1/3 )o "1 
—17664/3 1 ®. X } J 

10.21. The Critical Region. In the examples 10.2.1 and 

10.2.2 the null hypothesis H 0 is tested on the basis of a sample of 
size one and 10 respectively. In Ex. 10.2.1, H 0 is rejected if the 
observed sample point (in this case the single observation) falls 
above 4000. The outcome set in this experiment of ta ki ng a single 
observation may be represented by the line (0, oo) since our obser¬ 
vations are all positive because they are the consumption of 
electricity on various days. 


G 


■* 

0 


—*— 

4000 


S\ _ 

— > X 




Fig. 10.2. 

with regard to the test criterion in this example the outcome set 
(as this is a sampling problem we may very well call the outcome 
set the sample space) may be partitioned into two. H 0 is rejected 
if the observed sample point falls in one part and H 0 is accepted 
otherwise. This is illustrated in Fig. 10.2. The region of rejec¬ 
tion of Ho when H 0 is true or that region of the outcome set where 
Ho is rejected if the sample point falls in that region, is called 
the critical region of the test. The probability that the sample 
point falls in the critical region is called the size of the critical 
region of the test. Evidently the size of the critical region is a= 
the probability of committing the type I error. If our test was 
based on a sample of size 2 in Ex. 10.2.1, then the outcome set 
or the sample space is the first quadrant in a two-dimensional 
space and a test criterion will enable us to separate our out¬ 
come set into two subsets. If the sample point falls in one subset, 
n° is rejected and H 0 is accepted otherwise. This is illustrated 
big. 10.3 (a). In general if the outcome set is represented by a 
enn diagram then the critical region is as shown in Fig. 10.3 (6). 




Outcome set 
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The critical region is usually denoted by C. In the Ex. 10.2.1, 
the critical region C is the interval (4000, oo), This is shown in 



Fig. 10.3 ( b) 

Fig. 10.1. In Ex. 10.2.2 the outcome set may be given by a set 
of vectors of order 10 with the elements 1 and 0 where 1 denotes 
a head and 0 denotes a tail. The subset of vectors having 8 or 
more unities is the critical region C. 

10.22. The best test for a simple hypothesis. Often the 
test criterion is to be determined by controlling a and (3. 
If a and (3 can be simultaneously minimized it is desirable. 
But if a is minimized usually [3 becomes large and vice versa. This 
may be noticed from Fig. 10.1. Therefore a common practice is 
to select a test criterion which minimizes [3 for a fixed a. If there 
exists a test criterion which makes (3 a minimum for a fixed a then 
such a test is ealled the best test (best in the above sense) The 
existence of such a test when a simple hypothesis H 0 is tested 
against a simple alternative Hi, is given by a theorem due to 
J. Neyman and E.S. Pearson, which will be stated later. 

Exercises 

10 . 1 . A test rejects the hypothesis that the probability of a son being 
born to a couple is 1/2, if the first two children are girls. Construct the out¬ 
come set and the critical region for this test. (Assume that we consider 10 

children couples). 

10.2. In a population N(n, 1) the hypothesis that (i=2 is rejected if a 
random sample of size 2 has a mean greater than 5, Obtain the sample 
space and the critical region for the test. 

10 3. A coin is thrown 3 times. The hypothesis that the probability 
p of getting a head in any trial is 1/2 is rejected in favour of the hypothesis 
that p=l/ 3 if the- .three trials result in tails. Obtain the probabilities of the 
type I and the type II errors. 

10.4. In a township the milk consumption of the families is assumed to 
be exponentially distributed with the parameter 0. The hypothesis H 0 : 0=5 
is rejected in favour of Hj : 0 = 10 if r a family 'selected at random consumes 15 
units or more. Obtain the critical region and the probabilities a and p of the 
type I and type II errors. 

10.5. The hypothesis H 0 : p=50 is tested against the alternative 
: h-=60 by using a sample of size n from the population N(p., a=5). How 

large should n be if the probabilities of the type I and type II errors are 
a=0-025 and p=0.01 respectively. 
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10.3. THE POWER OF A TEST 

If the size of the critical region C is fixed (that is a is fixed) 
then that test which minimizes [3 may be called the best test in 
the case of a simple H 0 against a simple H x . 1 — (3=the probabi¬ 
lity of rejecting H 0 when it is not true. This is a correct decision 
a ndwe would like to have 1 —(3 as close to one as possible. Hence 

is often called the power of the test whose critical region is C. 

A test criterion uniquely determines the critical region. This is 
seen in Section 10.21. Hence the power of a test may be called 
the power of a critical region, or is the size of the critical region 
or the probability that the sample point will fall in the critical 
region. We can have a number of tests for a null hypothesis H 0 
with the same a but with different critical regions. This enables 
us to select that test for which the power 1 — (3 is a maximum. 

Ex. 10.3.1. A box contains 10 marbles , out of which 6 are red 
and the rest are green. We want to test the hypothesis H o :0=5 
against the alternative Hi : 6 = 4, Determine the size of the critical 
regions and the power of the tests A and B. 

Test A. Take two marbles at random with replacement and 
reject H 0 if both marbles are of the same colour. 

Test B. Take two marbles at random with replacement and 
reject H Q if both marbles are of different colours . 

Sol. H 0 : 0=5 

For test A, a=probability of rejecting H 0 when H 0 is true 

= probability of rejecting H 0 when 0 = 5 

= probability of getting 2 marbles of the same 
colour when there are 5 red marbles. 

= ( l )( 1 /2) 2 (l/2)«+ ( | j(l/2)»(l/2)2 

= 2(l/2)*=l/2. 

For test B, a = probability of getting 2 marbles of different 
colours when there are 5 red marbles. 

=( i )(l/2)‘(l/2)i-l/2. 

For test A, 1 —(3=probability of rejecting H 0 when Hi is true 

= probability of getting 2 marbles of the same 
colour when there are 4 red marbles. 
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For test B. l-P-( J -12/2« 



a 

1—P 

Test A 

1/2 

13/25 

Test B 

1/2 

12/25 


Comments. Test A is more powerful than test B and 
hence B may be called non-admissible. In other words the critical 
region corresponding to A is more powerful than the critical region 
in B. This example is only for illustration. In a practical situa¬ 
tion where we would like to test H 0 : 0=5, we would take out all 
the marbles and count the number of red marbles. Sampling 
procedure is not needed everywhere. 

When the alternative hypothesis is composite then 1 — p 
may be evaluated for each value of the parameter specified by Hj. 
For example consider the hypothesis H 0 : 0=0o against the alter¬ 
native Hi : 6^z6 0 . For each 0 not equal to 0 O we can evaluate 
1 —p or the power of a test. If 1 — p is plotted against 0, such 
a curve is called the power curve of a test. Let Fig. 10.4 give 
the power curves of three tests A, B and C with the same a. for 
testing H 0 ; 0 = 0 O against Hi : 0^0 O . 



Fig. 10.4. 


We would like to have 1—p as close to one as possible. Test 
A is more powerful than test B or C for 0>0 O . Test B is uniformly 
more powerful than test C since the power curve for B lies closer 
to the line 1 — (3 = 1 at every point than that of test C. Test A is 
less powerful than test B or C for 0<0o. Test C is said to be non- 
admissible compared to test B. If there exists a test which is 
uniformly more powerful than any other test it is called the uni¬ 
formly most powerful or the best test. A method of obtaining 
the most powerful test whenever it exists, will be given in a later 
theorem. 
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Ex. 10.3.2. Draw the power curve of the test in Ex. 10.2.1. 
testing Ho ' 6=1000 against the one sided alternative that 

} g i: e>1000. 

Sol. In 10.2.1, the parent distribution is exponential 
j the test rejects the null hypothesis if an observed value is 
greater than or equal to 4000. 


oo 


“=[ 

anon 


—as/1000 


1000 


dx=e~ A 


The power of this test =1 — ^probability of rejecting H 0 
when Hi is true 

GO 

t.e., e ' Xl ° dx 

- 40 00 

where 6 is any value >1000 (since Hi is 0>0o=lOOO) 

For 0=2000, 3000, 4000 etc., 1 —(3 will be e -2 , e _4/3 , e' 1 etc. 
The power curve is obtained by plotting 1 —against 0 where 
0>1UOO. This is given in Fig. 10.5. 



Comments. This is the power curve of a test which is 
based on a single observation. For different tests we can draw 
the corresponding power curves. Instead of the power curve if 
probability of accepting H 0 when Hi is true, is plotted against 
6 then we get, what is usually called an operation characteristic 
curve (OC-curve). In industrial applications OC curve is more 
often used than the power curve in order to compare tests and to 
ta ke decisions. The OC-curve for Ex. 10.3.2 is given in Fig. 10.5 
PjS dotted curve. It is obtained when (3 is piotted against 0. 

tr 10.3.3. A coin is thrown 6 times. The null hypothesis 

0 ' -P 3 ! is rejected if 5 or more trials result in heads. Obtain the 
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power ’curve of this test if H 0 : p=\ is tested against H x :p^ 
where p denotes the probability of getting a head in any trial. 

Sol. a=probability of rejecting H 0 when H 0 is true 

= ( 5 ) (i ) 5 <i) l + ( ® ) (i)« 

1 —[3 = probability of rejecting H 0 when Hi is true. 

Hi : p^£. 1/2. Hence p may be any value between 0 and 1 but 
not equal to 1/2. Therefore, 

1-P=( 6 ) (i>) 5 (!-?)*+( ® ) P* ( l ~P )° 

For various values of p, 1 — (3 maybe obtained from a bi¬ 
nomial probability table. The power curve is given in Fig. 10.6. 



Comments. If [3 is plotted against p the OC-curve for the 
test is obtained. It is given by the dotted curve in Fig. 10.6. 

So far we have been considering the various aspects of the 
theory of testing hypotheses when a test criterion is given. The 
following theorems will help us to obtain different test criteria. 

Theorem 10.1. The Neyman-Pears on Lemma. Con¬ 
sider the problem of testing a simple hypothesis H 0 : 6 = 6 0 against 
a simple alternative Hi : 6 = 6 1 . Let x\, xi ,...x n be a random 
sample from the population f(x, 6) under consideration. The 
likelihood function under H 0 and Hi are 

L 0 = II f(x { ,6 0 ) (10.6) 

*=l 


and 


Li= II f{x it 6 X ) 

t=l 


respectively. For example if f(x , 6) = ~-=■ 

V 2n 

and if Hq : n=5 and Hi ; /a = 10 then 


(10.7) 
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1 

(y/2u\ n 


e 


n 

— 2 5) 2 /2 

i=l 


and 



n 

-2 (ajf-iom 

t=l 


We have defined the best test in a simple H 0 against a simple 
Hi case, as that test with given a for which p is a minimum. 


The Neyman- Pearson lemma says that if there exists a 
critical region C of size a and a constant 1c such that 


and 



inside C 


outside C. 


( 10 . 8 ) 

(10.9) 


then C is the most powerful critical region for testing H 0 : 9=9 0 
against H! : 9—9\. 


We have seen that in a problem of testing a simple H 0 
against simple Hi the selection of a critical region is equivalent 
to the selection of a test criterion. The inequalities (10.8) and 
(10.9) give the best test whenever it exists. Since L 0 /Li = & defines 
a set of measure zero, in the continuous case the inequalities (10.8) 
and (10.9) may be written as 



1c inside C 


( 10 . 10 ) 


and 



1c outside C. 


( 10 . 11 ) 


Proof. C is of size a. Let D be another critical region of 
the same size a. C and D are two regions in an n -dimensional 
space (since the sample is of size n). A symbolic representation 
of C and D is given in Fig. 10.7). 



Fig. 10.7. 
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J L 0 dx= j U ( b y the definition of a critical region) 

( 10 . 12 ) 


D 


where ix=4*., and f<>r a multi ' 


pie integral over the ^-dimensional regions C and D 

C=(CnD)U (CnD) and D=(CnI>) U (CnD) 
and further CnD, CflD, Cf)D are disjoint. 

Hence equation (10.12) implies that 
j L 0 dx = j L 0 dx. 

COD 


(10.13) 


(10.14) 


Cf)D 


| L 2 dx= j Lx dx-\- j Li dx> j Li dx+ j -j- 
c cnD cno cnD cn D 


dx 


cnD 

^ since Li > 


cno 
L 0 


cno 

inside C 


) 

J* Lx dx> j Li dx-\--^ j L 0 dx — 


(10.15) 


cno 


cno 


j Li dx-\- i- J L 0 dx [by equation (10.14)] (10.16) 


cno 


cno 


But j dx >J La dx ^ since Li<-^ outside C ^ 


cno 


cno 


(10.17) 


Equations (10.16) and (10.17) => 


| Li dx >| Li cto+j* Li Li dx 


c 


I 


c 


cno cnD d 

Li dx> [ Lx dx 


D 


(10.18) 


i.e., the power of C is > the power of D. 

Hence (3 for C is <(3 for JD. 

C is the most powerful critical region. The proof for a dis¬ 
crete case is similar and is left to the reader. The same theorem 
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test 


can be used in some cases to obtain the best test when testing a 
g i m ple Ho against a composite Hi. This may be seen from the 
following examples. Whenever a sufficient statistic for the para¬ 
meter exists it can be easily seen that the best critical region is a 
function of the value of the sufficient statistic. For this and other 
related topics such as unbiasedness, similar critical regions, most 
powerful unbiased tests etc.,"see the books listed in the Biblio¬ 
graphy at the end of this chapter. 

Ex. 10.3.4. x\, Xz,..., x n is a random sample from a N{p, a) 

where a is known. Obtain the best critical region of size a for testing 
: [i=p 0 against Hi : p=pi where p 0 and Px are specified values. 

Sol. This is a case of a simple H 0 against a simple Hi. 

Theorem 10.1 can be used to obtain the best test. 



1 

(cry' 2 tt)" 


n 


- 2 (*< —P 0 ) 3 / 2 ®* 

t=l 


e 


(10.19) 


- £ (* i -^ i ) 2 / 2 ° 2 

L 1= —C i=1 (10.20) 

{ay/ 2 tc )" 

JL°_ = e [4(^ ) Ix i J/ a2 


(obtained by simplification 

( 10 . 20 ) 


According to the theorem 10.1 the best critical region C is 
given by 


and 


L 0 

u 

Lq 

u 


<k inside C 

outside Cwhen k is a constant. 


k=> e[ lf~( ‘‘i — s Xi ]/ ct2 < 

lu 


( 10 . 22 ) 

Taking logarithms and simplifying the inequality we get 
(Po—Pi) k! where k' is a constant 
aQ d 2=27 xjn. (10.23) 

Case I. Let Vi> Po 

Then ^ 0 —pi<0 and division of (10.23) by po — pi gives 
"where K is a constant. (10.24) 
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The best critics] region is given by the inequality (10.24). K 
is easily determined since the size of the critical region is «. 


00 


t.e. 


| f(x) dx (when /a=/* 0 )= 


a 


(10.25) 


K 


But we know that K= ytx 0 -f-Z 


* y/n 


where Z* is given in 


Fig. 10.8 which gives the distribution of J N(0, 1). 



Fig. 10.8. 

Hence the test can be stated as follows. When the observed 

- reject H 0 , otherwise accept H 0 . 

(10.23) 


sample mean x 


i.e., when £ ^ Z* reject H 0 


a/V n 

where c, Pq, n are known and Z* is obtained from a normal pro¬ 
bability table. 

Case II. Let Pi<no then f*o—f*i>0. 

Division of (10.23) by /*„— f*i gives 

inside C (10.27) 

where K' is a constant which can be easily determined. 

K' 


| f{z)dx (when f*=fx Q ) ,= a 


(10.28) 


00 




K*— Po — Zy. ■ 


This is illustrated in Fig. 10.0. 
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When 


— Z 


« reject H 0 


i.6< 


£ — /*o 

when ..,- 7- Z<* reject H 0 , 


ajyjn 


(10.29) 


is 


otherwise accept B 0 , where p„, n are known and Z 
determined from a normal probability table. 1 

In these two cases we get one sided tests. Usually one-sided 
alternatives, that is, alternatives of the type 0>0 n or 8^8 
iead to one sided tests. Here we have seen that our success'in 
finding the best test depends on getting a statistic whose distribu¬ 
tion is independent of the parameters. For example, in the above 
oases we know that the distribution of (X-,*„)/( 0 /V») is indepen. 
dent of f* and a and hence Z. is easily obtained and hence the 
inequalities (10.26) and (10.29) are obtained. 

,. . ., E f* 10 * 3,5 * The y l eld °f a Particular crop is assumed to be 
distributed as a N(p,g= 2). A random sample of 9 test plots qive 
an average yield of 25 units. Test the hypothesis that the true 

average yield p—20 against the alternative that p>20 at the 5 0/ 
level. /0 

Sol. X : N(/x, o); : N(0, 1) 

CT/V W 

_ The Neyman-Pearson lemma leads to the test criterion that 

if ^ - Mo ry f 

11 ff/yw ^ ^°'° 5 re i ect Ho, otherwise accept H 0 . 

. H ® r ® £ss25 ' Mo= 20, o*2, n= 9 and Z 0 . 05 =1.64 ; Z 0 05 is 
obtained from a normal probability table. 


00 


f 


1.64 


X 


2n 


e~ t2 l 2 dt= 0.05 


—fio (25—20) 
o'ly/n ~ 2/3 


= 7.5>1 64 



Fig. 10.10. 
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Hence H 0 : f*=20 is rejected in favour of Hi : p>20. 

Ex. 10.3.6. The weights of the students in a particular grade 
are assumed to be a c=5). A random sample of 25 students 
give a total weight equal to 1250 units. Test the hypothesis that 
lic=52 against the alternative p<c.52 at the 1% level. 

Sol. Here x=^°=50, ^=52, g=5, and n=25 


X : N(p, a) 

Hence a convenient statistic is 


X — fx, 
ojyjn 


N(0, 1) 


The Neyman-Pearson lemma leads to the test criterion that 

je— fji 0 

if G j^ n <-Zo-oi reject H 0 : otherwise accept H 0 . 

Here Z 0 . 01 = 2 .32 (obtained from a normal probability table). 
-2.32 

1 


f 


\Z2k 


=- e ~* 2 / 2 di = 0.01 


00 


(50-52) 


c fV n 


5/5 


= —2> —2.32 



Fig. 10.11. 


Hence H 0 : /u=52 is accepted. 

Theorem 10.2. The Likelihood Ratio Test. This test 
can be considered to be a generalization of the Neyman-Pearson 
lemma tor testing a simple H 0 against a simple Hi. 
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^““^osit^Hx^i.e.^^othesfeofthetype 01 * 

H 0 : 0 = 6 0 [Ho : 6>0 O fll 0 *: 0 ( 1 ) = 0 (1 >, ( some of the para- 
J ' 0 is meters are specified) 


[Hi: 6^z6 0 (h x : e<6 0 IHj. : etc. 


Let mm x,o-the maximum value of the likelihood function 
L under the null hypothesis H„. This is obtained by substituting 
the maximum likelihood estimates of the parameter in L unde! 
H 0 . If H 0 specifies all the parameters in L then max. L 0 =L„ itself. 
Let max. L be the maximum of L with respect to the 'parameters 


^__ max, Lg 

“max. L 


(10.30) 


A is called the likelihood ratio statistic or the A-criterion. The 
likelihood ratio test says that a uniformly most powerful critical 
region C for testing H 0 , simple or composite, is usually obtained 
by the inequality 

Ainside C (10.31) 


where the constant k is usually determined by the inequality that 
the probability of the type I error <oc. (10.31) does not always 
give a uniformly most powerful test. Sometimes it leads to a 
non-admissible test. Evidently 0<A<1 and intuitively A is a 
reasonable test statistic for H 0 . 

In the case of a composite hypothesis we can formulate the 
hypothesis in the following general terms. 

H 0 : 0 e co 

H! : 0 e D Q (10.32) 


where a> Q Q and Q is the parameter space. £1 is the space 
generated by all possible values of the parameters. For example 
in an exponential distribution 



e —x/Q f or £>0 anc i 0^0 


(^0 elsewhere 


the parameter 0>O and hence Q is the open interval 0 to oo 


Q = (0, oo). 




304 


INTRODUCTION TO STATISTICAL MATHEMATICS 


In a normal distribution N (p, a), —oo</x<oo and <r>0. 


SL 



Therefore £1 is the upper half 
plane if p is measured on the a-axis 
and g on the y-axis. If there are k 
parameters in a distribution Q is a 
subspace of a ^-dimensional space. 
In general if fi is denoted by a Venn 
diagram then the hypothesis H 0 and 
Hi are illustrated in Fig. 10.12. 

H 0 : 6 e oj i Hi : 6 e o> D fl 


Fig. 10.12. where 6 stands for all the parameters 

in the distribution under consideration. 


Ex. 10.3.7. Obtain the critical region by using the likelihood 
ratio test for testing H 0 : p=p 0 against H x : pj^po in a N{P> g) where 
g is known. The probability of the type I error is given to be «^oc 
and a random sample x x , x 2 . x n is observed. 


Sol. 

n 

1 - S (* < -P 0 ) , /2«* 

-L _— e t=i 

(\Z2-n: a) n 


Here 

g is known and H 0 : fx = /x 0 . Therefore 



max. L 0 =L 0 — — 1 . e ^ ^ 

(^/'Ztzg)* 

(10.33) 

The maximum likelihood estimate of p is x=2x i /n 



n 

T 1 - s (x i -Z)*j2o i 

max. L=- e i -1 ' 

( \/ 2tc<t) w 

(10.34) 


n 

— 2 K-|x 0 )V2a 2 
^ max L 0 e ia=,i 

max. L -£(»,■- x) z /2a i 

V 

(10.35) 


= C -»(S-F MV 2 * 2 , 

(10.36) 


A < k => 



e —"(25—P-o) a / 2 ° 2 ^ ^ 

(10.37) 


[ £ —where k! is a constant or 

(10.38) 


1 | >F. 

(10.39) 


^ke best critical region is determined by the inequality 
(10.39) and is obtained by using the result that the probability of 
the type I error < a. 
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: N(0 - 


We know that P j \~jf% 


^ Z oc/2^ = 


a 


Fbere Z ./2 >s illustrated in Fig. 9.13. 
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The critical region is determined by the inequality 


i ^ 


or 


-^0 

IVn 

^ Z a/2 

(10.40) 

1 

> V 

(10.41) 


Ex. 10.3.8. The incomes per week of the citizens in a township 
Me assumed to have a distribution N(fx, c t= 5). A random sample 
of 25 people shows an average income of %90 per week. Test 
the hypothesis that the true average income /x =$100 against the 
alternative ^100. Use k = 0.05. 

Sol. 5; = 90, n=25, /x 0 =100, cr=5 and Z a/2 =1.96 

X : N(/z, a) 


X-ju . 


a/y/n 


: N(0, 1) 


likelihood ratio test if 


/^o 

o/y/'n 


>1.96, H 0 is rejected 


* ~jf o j 190-100 
ff/Vni”l (5/5) 


= 10> 1.96 
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Hence H 0 is rejected 



Fig. 10.14 

Comments. It is seen from Ex. 10.3.4, Ex. 10.3.5 and 
Ex. 10.3.7 that usually one sided alternatives lead to one sided 
tests and a two sided alternative leads to a two sided test. This is 
not true in general. 

Theorem 10.3. We shall state the following theorem with- 
out i proof. Under very general conditions the distribution of 
2 log, A approaches a x 2 distribution with its degrees of freedom 
equal to the number of parameters that are determined by the 
hypothesis H 0 , when n is sufficiently large, where A is the likeli¬ 
hood ratio criterion given by equation (10.30). 

For example in testing H 0 ycx = /x 0 and <j=c 0 against Hx : 
and g=£g &) in a N (n, cr) based on a random sample of size n, — 2 log,A 
is approximately a x 2 with 2 degrees of freedom since two 
parameters are specified by H 0 . When the parent population is a 

a) where cj is known we can show that for any n, —2 log, A 
for testing H e : /x = n 9 against Hi : is a x 2 with 1 degree of 

freedom. 

In this case 


—2 log, A = ^L-(x-n 0 ) 


„ ( V 8 

l c/fn ) 


(shown in Ex. 10.3.7) 
(10.42) 


But when X : N(p, a), is a (N(0, 1) and therefore 

(X — fl 0 \2 

V HfVn ) 18 a x2 with one degree of freedom. 

Theorem 10.3, does not specify the parent population. 
According to this theorem for large n, whatever may be the 
parent population the null hypothesis H 0 is rejected when. 


~ 2 log A > x 2 

oc,r 


(10.43) 
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where r denotes the P oint such that P [ x 3 > 2 ) 




stands for the number of parameters specified hv TT /n,~ , 

of degrees of freedo m of the *•) « TtandV tTo 

gignincance • 

Ex. 10.3.9. X u X 2 ,...X l are h independent normal variables 
with means pi, Pi,*", Pk and with variance a 2 , a 2 a 2 Random 

samples of sizes n lt n 2 ,..., n k are taken from these k populations . Test 
the hypothesis H 0 : a 1 —G 2 =...= ( y k at the 100 a% level. 

Sol. Let Xu, x i2 ,...Xj ni be the observed random sample from 
•the i ttl population. i=l, 2 ,... k. 

The joint density function of the random sample for the i tfl 
population 

1 ~ti <*"■ < l0 - 44 > 


— _n 


=v e j 


CT ™ (V2n ) n i 
The likelihood function L is therefore given by 


n , 


k 

l= n 


Z (*i, — Pi) 2 l2<; 2 . 

j=l t 


i= 1 (V^)\ 

% 


e J 


(10.45) 


19 

(V 2 7t) n (H. o* w< ) 

l = \ 


k 

- R 
, i =1 


s 


(*< -/*i) a /2oJ 


where »= Wl + TO2 +..._|_ nfc (10.46) 

The maximum likelihood estimates of /*!, p 2 ,.^, Pk and 01 , 
c 2>-. cifc are obtained by maximizing L with respect to these 
parameters. The estimates are easily seen to be 

n* 2 n 2 

Pi = R xu ln t = Xt and g^~R l*ij --^,) 2 / w * =:5 i ( sa y) 

j =1 ; =1 

(10.47) 

The maximum value of L is obtained by substituting these 
estimates in L 


Max. L = 


^ 2n >" (ir h n ‘) 

» = 1 


e -«/2 
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H,:*.=* 2 =-="* =<j(Say) . ,1(U8> 

H 0 specifies S 

— rfie% q P ual Whence on ly ^ 
independent parameters are specified. 


The likelihood function L under H 0 is 


-S S(x ii -H) , l 2a ’ 


L ft = 


G n ( \/2tc ) n 


(10.49) 


k 

Since II =<7 Wl +’-+ Wfc =cr n when ai =...- o k -o. 
i= 1 

The maximum likelihood estimates of the parameters under 
H 0 are obtained by maximizing L 0 with respect to /ri, ^ 2 , •••/** and 
c. The estimates are easily seen to be 


n„ 


Wl -f- - • - —{- s\ 


z and ct2 = 

^=1 


% 


(10.50) 


for i = l, 2, 3...k. 

max. L 0 is obtained by substituting these estimates in L 0 . 


max L n =--— e' n / 2 

(V2tt)" 


(10 51) 


A= 


max L 0 


6 n l 




o' v 2 c^'7^ 

V — V 


max L 


( 


n i +•■• 4 n Je s 


n/2 


(10.52) 


& 


% 


Since Ho specifies & —1 parameters, —2 log e A may be assumed 
to have a x 2 distribution with k —1 degrees of freedom when 
is sufficiently large. The hypothesis H 0 is rejected when 

2 log A > x 2 a /c _ 1 when n is large, where A is given by (10 52) 


and x a, h-i 18 the tabulated value of ax*_ L at the 100 oc% level. 

nf t ? he P roblem of testing equality of variances 

statisticwfu5, C * 1Cal lm ?°i tance ' For small values of % a modified 
t ^ a + “ odlficati ^ of —2 log A and which has an 
suggested ^ 18 ribu ^ 10n w ith k— 1 degrees of freedom is often 
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1 /c 1 

2 log Ml + apriTS ( £ -*rr- ^ )]: X^^pprorimately 


Hi— 1 


where /* 


/c 

= n 


I 

i=i v 


^ \ 

«T Z T'/ 


/U£y 


»l «? 


2(^-1) 

2 


(10.54) 


Exercises 


10.6. By using the Neyman-Pearson Lemmn 

testing the following hypothesis : (a) H 0 : <r 0 =a 0) Hl : cr>a 0 in a N (0 T X 
2 o : 0 = 0 O , Hi : 0>0 O m an exponential population, (c) H n • h ■ 
in a Binomial population with the parameters n and p (Assume thiS rtS 
probability of the type I error is a and a sample of size taken). * th 

10.7. Obtain the best test, if it exists, for testing the hypothesis H 0 : 
6=5 against H t : 0=6, in the population /(x, 0)=l/0 for O<x< 0 . Assume 
that a single observation is taken and a = 0.05. 

10 8 * In the problem 10.7 if the critical region is given as 4.3 
obtain a and p. ^ 

10*!)' Foi testing the hypothesis H 0 : 0=6 in the population 
J(x, 0) = l/0 for O<a;<0, against the alternative H x : 0^5 draw the power 
curve if the test is based on a single observation and if the critical region is 
x ^ 4.5 or x ^ 0.5. 

10.10. In a shipment of 10 articles 0 are defective. The hypothesis 
H 0 : 0 = 5 is rejected if two articles drawn at random without replacement are 
either both good or both defective, otherwise the hypothesis is accepted. 
Obtain p if is 0 = 0, 1, 2, 3, 4, 6 , 7, b, 9, 10 and plot the power curve and 
the OC-curve for this test. 

10.11. All the 12 students in a class are classified into two groups 
according to their aptitude for Mathematics. 0 of them are interested in 
Mathematics and the others are not interested in Mathematics. The 
hypothesis H 0 : 0=6 is tested against the alternative H x : 0^6 by the 
following tests (1) Two students are selected at random with replacement 
and H 0 is rejected if both are either in one group or in the other group ; ( 2 ) 
Two students are selected at random with replacement and H 0 is rejected if 
they belong to different groups. Is one of these critical regions non- 
admissible ? Draw the power curves for the two critical regions. 

10.12. For the hypothesis II 0 : a=2 is tested against H x : in a 

N(0, c). Draw the power curve and the OC curve for the critical region if 
<x=0.05, assuming that the test is based on a random sample of size 9. 

10.13. N( l x 1 . 1). N(h 2 , 1), . , N(ti h 1) Jc independent populations. 
Obtain the likelihood ratio test for testing tlie hypothesis U 0 : h-i— 

against the alternative that all the n’s are not equal, assuming 
that the test is based on random samples of sizes %,••** ,l 7c res P ec 1V ^ 
the size of the critical region is a. 

10.14. A random sample of size n is taken from a N(h-.. °)* ; s f 

the hypothesis H 0 : F#i show that the likelihood ratio criterion X is a 

function of a student t. Obtain the distribution of X ^ • 

10.15. Obtain the likelihood ratio criterion for popula- 

where N^, NO* Siese 

n . ns ; Assume that random samples of sizes —> n k 

H u pulations. 
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10.16. Obtain the likelihood ratio criterion for testing H 0 : Pc=() 
where p is the correlation coefficient in a bivariate normal population. 

10.17. If a hypothesis H O :0<0 O against H x : 0>0 O is tested about 
the parameter 0 of an exponential distribution what are u and £2 ? 

10.4. TESTS CONCERNING MEANS 

Consider a normal population N(p, g) where a- is known. The 
one sided and two sided tests concerning p have already been 
discussed in section 10.3. When g is unknown, for testing, 

H 0 : p —p 0 against the alternative 
(1) Pl 1 ;/x<p 0 , (2) p>po, (3) p^po 
a student t statistic can be conveniently used, 


X-p # 

S'/Vn :tn ' 1 


(10.55) 


that is, the statistic (X— p)/(S 'l\/n) is astudent t with n —1 degrees 

of freedom, where S /2 =X(X t -~X) 2 /(W— 1) is an unbiased estimator 
of a 2 . The Neyman—Pearson lemma leads to the test criteria 
f H 0 • P — po — 

I H, : K "- 1 rejeCt H ° (1 °- 56) 


LHi : p<po 

fH 0 : p = p 0 

LHi : p>p 0 

fH 0 : p=p o 

) 

\ 

LHi : p^po 


V lvn) ^ *«> reject H ° (10.57) 

(s'/\/ri) ^ 4x/2,n—1 reject H 0 (10.58) 


where w _j and t a / 2 , n _i are the values of a student t with n—1 
degrees of freedom such that 

°° 00 

J f(t) dt=a. and j f(t) dt=a/2 respectively (10.59) 


a, n — 1 


'a/2, n—1 


and f(t) is the density function of a student t with n — 1 degrees 
of freedom. These tests are illustrated in Fig. 10.15 


4 (, n-i O 

Reject H 0 Accept H 0 


f H 0 : 

l Hj : 


P = P o 
P<P0 


Q - ~r -- --- kXTbv 

* - °(’ n ~{ 0 *d/2,n -i 

Accept Ff 0 Rejectf / 0 RejectH 0 AcceptH 0 Reject If 0 


Fig. 10.15. 


f H 0 : 

( Hi : 


P = Po 
P>P o 


f H 0 : 

\ Hi : 


P = Po 
P7^P# 
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When the sample size n is large (>30) then 

approximately normally distributed. Therefore 
1-hen n Is l ar g e ’ i ns l )ea( ^ °f a student t distribution, a normal 
distribution can be used. 

Ex. 10.4.1. A random sample of size 9 from a N{ ja, s) has 
\fi»an 20 and variance 16. Test the hypothesis Hq : i±=25 against 
the alternative at the 5% level. 

Sol. «=9,1=20, s 2 -'Z(x J - — x) 2 /n — 16 and /a # =25. 

77 O 

s' 2 = — T 6*- 4r. 16=18 
n — 1 8 

| [x-yoWIVn) 1 = | (20—25)/(18/3) | =5/6. 

The tabulated value of a student t with n— 1 = 8 degrees of 
freedom at tbe 5% level, that is, £o.025, 8 = 2.31 (obtained from a 
student t table) 


i.e., f(t) dt = 0.025 

2.3 L 

where f(t) is the density function of a student t with 8 degrees of 
freedom. In this problem the alternative Hi is ja^:25. Hence 
H 0 is rejected if | {z-p 0 )l(s' / s/n) | > 2.31. 

But the observed value of the student t statistic is 5/6 <2.31 
Hence the hypothesis is accepted. 

Comments. If the sample size was >30 we could have 
based our test on a standardized normal variable, 


— /-h) 
S '/y/n 


: N(0, 1) approximately 


(10.60) 


Testing of a simple hypothesis may be explained as follows. 
°ur assumption, that the sample is a random sample from a 
CT )» is correct then under the hypothesis H 0 : P—Po the statistic 
(~M 0 )/(S7Vn) is a student t with t*- 1 degrees of freedom. An 
Va lue of this statistic falls between —2.31 and 2.31 
a “ a probability equal to 0.95. In this problem we observe 
hpf/*' Ue this statistic. The value is 5/6 and which lies in 
crab 6611 an d 2.31. If our observed value falls below —2 31 

I fou n ^ ^ ^ le P r °bability for such an event is less than 0.05. 
ypothesis is correct an improbable event has happened. 
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Hence we will be compelled to reject oar hypothesis. Even if tj le 
parent population is not normal we can get a normal approxima¬ 
tion for large samples by using the central limit theorem. Tests 
can be based on this approximation. 


Exercises 


10 18. A random sample of size 9 from a N(n, a = 2) has a mean 50. 
Test the hypothesis, (1) H 0 : n = 52 against H x : h><52 ; (2) H # : h- = 52 against 
H x : p-?^52, at the 5% level. 

10.19. The time taken by a particle to move from one fixed point to 

another fixed point is distributed as a N(n-, cr = 0.!)• 16 independent trials 

of this experiment give the average time equal to 10 units. Test the hypo¬ 
thesis that the expected duration of travel in any trial is 9 units against the 
alternative that it is more, at the 1% level. 

10.20. The lifetime of television picture tubes produced by a parti¬ 
cular factory is assumed to have an exponential distribution. One tube 
selected at random had a life time of 950 hrs. Test the factory’s claim that 
the expected life time is 1000 hrs. against the alternative that it is less, at 
the i% level. 


10 21. Suppose that the energy uped by a person for walking a unit 
distance is given by e = (l/10) wf(l/4) A where e, w, A denote, energy, 
weight, and age respectively. An experiment conducted on a random sample 
ot lb twenty year old people from a certain city yields the following results : 


£ e* = 32, £e;=72 where denotes the energy used by the ith person. Assum¬ 
ing that e’s are distributed as a N^, a) te -t the hypothesis that the expected 

weight of the 20 year olds m the city is 120 against the alternative that it is 
not, at the 1% level. 


10 . 22 . A feeding experiment conducted on 100 experimen tal animals 

weight °f 10 l bs with a standard deviation of 
2 lbs. Test the hypothesis that the expected increase is 12 lbs against the 
alternative that it is not, at the 5% level. 6 

[Hint : Use a normal approximation], 

10.41. Tests Concerning Differences between Means. 

different hlT Y problems wbere we are interested in testing the 

to test the foiCTnrhySt 011 n )The ® U ^° S< f that W ?-'T* 
variety of wheat e/ceedTEtf aether va^ty by Units (2 

A^more powerful tha^ detergent £"( 4 ) Det6r ^ e f 

can be considered to°be case=s of g '^ S Al1 these P roblems 
population means In th;- *.• e3fcln o t.ie difference between two 

cases of testing probl ? we will consider a few simple 

° problems when the populations are normal. 

independent populatlon^N? 0 ™ sampIes op slzes »i and rt 2 from two 
are known. Suppose that ’YY* a t), where and a% 

hypothtses. PP that ' ve would like to test the following 
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TEST OF hypotheses 


(a) Ho: Pi —(&) H 0 : 
Hi : pi-H<$ Hi : 

We know that 


( c ) H 0 : 

Hi . jj,i 

(10.61) 


Z= 


(X,-X 2 )~5 


77 


+ 


w 2 


J is a N(0,1). 


(10.62) 


It can be easily seen that the likelihood ratio technique 
wdl^give the follomng test cr.teria for testing the hypotheses in 


The null hypothesis H 0 is rejected if 

m (a) ; z> z* in (6) ; | z [ > z a/2 in (c) (10.63) 

where * is an observed value of Z, z. and z a/2 are such that 

CD 

f 1 -i 2 /2 r 1 —1*/2 

] V 2 T e and j J2n 6 d( = a/2 ‘ ( 10 - 64 ) 

« Z 

a a/2 

If <ti and a 2 are unknown then the statistic 


Z 


(X 1 -X a )-(S) 



: N(0, 1) approximately 


(10.65) 

W 611 U \ and U2 a J e large > where Xi and X 2 are the sample means 
an d Sj and S 2 are the sample variances. 



n i __ 

vXii—Xi) 2 /% and S 2 

i == J. 2 


^2 _ 

= S (X 2i —X 2 ) 2 /« 2 . 
»=1 


€( 1 based on th^^sfnf’Qf-^ 0 ^ 68 ^^ tests maybe construct- 

te b * ^e“sUtslfctts 1 in ^ ”» “■'* ““ 

ate N (^i, t) anri Ila Mi POpUl i atl0 ? S '/ ai=a 2 or if the populations 

ratio techniaiip i 0 hi and lf a is kn °wn then the likelihood 
J “ique leads to the test statistic 
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(Xi-X.J-S 

( 

V n 1 n 2 ) 


: N(0, 1). 


(10.67) 


For testing the various hypotheses in (10.61) the tegt criteria 
are similar to the ones given in (10.63). 

If (7 i = c- 2 =a where c is unknown then the likelihood ratio 
technique will lead to the following statistic t which is a student 
t with Wi-f-n 2 —2 degeees of freedom 


(Xi-X 2 )-S 


-j- “—2 


s 


(—+—y 

\ Wi w 2 / 


where S 2 =^ wi S* S* ^ j ^ n x ~\-n 2 — 2 j and 


( 10 . 68 ) 

and S* 


are the sample variances. In this case the various test criteria 
for testing the hypotheses in (10.61) are as follows. Reject the 
hypothesis H 0 if 


^a, n x -f n 2 —2 i n ( a ) > n 1 -\-n i —2 (^) » 

I * I > ^/2,n 1+ n J -2 in ( c ) ^ 10 - 69 > 

where t is an observed value of t n ^ n 2 in (10.68) 

and n,+n 2 _2 and f a/2} n 1+ „ 2 _2 are such that 


OO GO 

| f(t) dt= a and J /(«) dt = u/2 (10.70) 

^a, n 1 +»,-2 *oc/2, Wl + w 2 -2 

where /(£) is the density function of a student t with wi-j-w 2 —2 
degrees of freedom. When n x -\-n 2 — 2)>30 the students statistic 
m (10.68) approximates N(0, 1) and therefore the tests in this case 
can be based on a N(0, 1) variable. 

Ex. 10.41.1. Two random samples of sizes 10 and 12 of I.Q's 
of men and women, have means 101 and 98 respectively . Assum¬ 
ing that the I.Q's are independently normally distributed as 
o x — 4) and N(y 2 , a 2 =3), test the hypothesis H 0 : yi — Pz against 
the alternative H x : y x ^zy 2 at the 5% level. 

Sol. xi=101, % = 98, n x = 10, = 12, ffl =4, o 2 =3. 

a=0.05 

Xi: N(^!, oi=4) 

X 3 : N(/x lf o 2 =3) 


and 
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0 F hypotheses 

(Xi—X 2 )/i 2 ) 

Z — "'f— 



: N(0, 1) 


where /*i—/*a = 0 under H o- 


(10.71) 


Hence H 0 is rejected if | z | > z 0 .025 =1-96. 
z ='0 025 k °btained from a normal probability table. 

When H 0 is true 


101-98-0 j 

/ 16 jT 

V ~10 + 12 


= 1.89 < 1.96 


The hypothesis is accepted. 

Comments. If we want to test the hypothesis that Mi>M 2 
we usually formulate the hypothesis H® : Mi = Ma and test H® 
against Hj i mi>j u 2 . This is why H® is usually called the null 
hypothesis. We take the hypothesis Hq in the form Mi=M 2 ord y 
for convenience. In genera! we may define the null hypothesis 
as that hypothesis whose false rejection is considered to be a type I 
error. 


Ex. 10.41.2. The average yields of 10 and 20 test plots of two 
varieties of wheat are 30 and 40 with standard deviation 4 and 6 
respectively. If the yields of the two varieties of wheat are assumed 
to be independently and normally distributed as X(mi, a) and N(n 2 , a) 
respectively, test whether there is any significant difference between 
the average yields of the two varieties, at the 5% level. 

Sol. *2 = 30, £ 2 =40, fti = 10, n 2 ^20, cri —o- 2 = c7 
Si = 4, s 2 = 6 and a —0.05 
Xi : N(/ii> or) 

X 2 : N(m 2 , a) 

Let H 0 : y l =p, 2 (There is no difference between the average 

yields) 

Hi : pj 

When IT 0 is true Mi—M 2 =0 


(Xi-X.)—0 



2 : N(0, 1) 

2 


(10.72) 
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But <j is unknown, a may be estimated by S where 

ni Sj 5 -f n 2 S l 

S 2 =_ 

n 1 -\-n 2 —2 


and 


(Xx- Xl-O 

S a/- + — 

V % Kj 



+n 2 —2 


H 0 : i u 1 = /x a is rejected if 


iCi — o; 2 




I ^ t~l : . 


w i+w t —2 ^0.025,28 =2.048, 


n 2 


*0.025,28 =2.048 is obtained from a student t table. 


1 

£■> 

II 

CO 

o 

1 

o 

■ V-L + 1 

V Wi v 2 


5 6 A / — JL- ~ 
iO + 20 


> 2.048 


The hypothesis is to be rejected. 

Comments. Here H 0 : /*i = ^ 2 is rejected. Hence if our 
assumptions of normality etc., are correct, the observed difference 
between x x and x 2 cannot be attributed to chance alone. We can 
be on the safe side in assuming such an inference in this case. If 

j^/ urtlier tllafc tJie on ly other variation in the observations 
is the difference between the effects of the two varieties of wheat 
we can possibly say that the varieties may be considered to be 
different as far as their yields are concerned. Tests of significance 
will be discussed in the last chapter. Instead of two varieties if 
we had k varieties and if we wanted to test the hypothesis 

H° : ^i-/^2 =.■ ~ H-k against Hi : not all /Ps are equal, then a 

test criterion could be constructed by using the likelihood ratio 
technique. A general method of dealing with such problems, 

called the analysis of variance technique, will be discussed in the 
last chapter. 


The problem of testing H 0 : fii = ^ 2 when g x ^g 2 where g x 
and (j 2 are unknown and when n x and n 2 are not large, does not 

C0 u 6 ^ del ^ 0f the case3 discussed so far. This problem is 
called the Behrens-Fisher problem. This will not be discussed 
£. ere \ For this and related topics, see Kenney, J.F, and E.S. 
•iveepmg Mathematics of Statistics, Part 2, Van Nostrand Co., 

7 I S’ ? 51 and other refere nces in the bibliography at the 
end otth is chapter. The reader should take particular care in 
e p nl o &ophy of ‘testing hypotheses’. Since a hypothesis is 
liierent from a fact whenever we accept a hypothesis, it does not 
ean at the hypothesis is true. Acceptance of a hypothesis that 



fflST oF HYPOTHESES 

a f) n does not mean that 0 -8 0 , Q may or mav r w v 
tn other words we are not making a statement, *£.* 

* are not proving anything. eiore u=e 0 c 


o- 

or 


Exercises 

10.23. Two diets are compared by conducting 

«ets of 40 and 50 experimental animals. The aver™, i™. experiment on two 
to the diets A and B are 10 lbs and U lbs with standard wei 8 hts due 

3 lbs respectively. Check the claim that diet B inert*! de ^ c ' ions 2 lbs and 
f bs more than that of diet A on the average. inCreases the weight by 3 

10.24. Two methods of teaching are coinnamd u j. 

of 20 and 25 students by two teachers whose I O’a nT , ft ^ two classes 

tively. The marks obtained by the Btudents ‘are assumed 120 res P ec - 

tributions NlTj+lj/10, a) and iS^T.+Ij/io, CT ) respective]! to , have the dis- 

denote the effect of teaching and the I.Q of the ve where T and I 

hypothesis that the two methods are equally effective Test the 
average marks of the two classes are 65 and 7*1 with'® > V 0 * el * Th< ” 
and 3 respectively. ' Wlth standard deviations 2 

10.25 The money spent by the customers at two stores selling the 
SKJ^el^ B iS ^d U ^ OI denote B the^beaut^of the 

at * he flrSt Sh ° P -0 *° ouatomera s^nt a 

beforTu d° W8 “ lbs 6 w^h a°'taad“rd'd^viS' of f ft® 

before undergoing the treatment and an average weight of 140 ft, Jitn . 

standard deviation of 4 lbs after undergoing the treatment The Wlth * 

distributions before and after the treatment are assumed to be N ( ll Tfroui 

N(h - 2 CT ) respectively. Check the olaim that the special treatment if eWei? d 

m mdncmg we.ght, at the 5% leva.. Assume thafthe l £SSEL*£t£. 

annual “feme rfUM tr T 

of 4to citizens in city ft shows an average Income of% wfth 

Bignifii”c“j' USe a n ° rmal ?ippr ° Ximati0n and choose a suitable level of 

native h' 28 ,! F ^ T f^ypothesis B : ^-^=20 against the alter. 

random sam^Ia efth' m th ® P°P ulatl0ns N( a,) and N(^„ a s >, obtain » if 
3=0 0 ? 8 am P Ies of the same n are taken and if 0 l =4, <r a = 5 , a=0.05 and 


10.5. TESTS CONCERNING VARIANCES 

when In ^ ectlon J 0.41, we have seen that for testing H 0 :/x 1= ^ 2 
used and w ^ en Wi+ft 2 —2^30 a student t statistic can be 

testing th a P re '. rec l u isite of this test we have the problem of 
t-ions wh 6 equalit y variances. There are many practical situa- 
this sect' 616 We w P u ^ d like to test the equality of variabilities. In 
ing the v° n - We cons ider only the special case of test concern- 
variances when the populations are normal. 
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Consider the single sample problem of testing 
H 0 ; o z = a* against the alternative, 

(a) Hi : 5 (&) Hi : cr 2 <a 2 , 


(c) H : a 2 ^< 7* q . 


(10.73) 


That is, we have a random sample of size n from a N(//, a) 
and we want to test the hypothesis ct 2 =cf 2 where ct 0 is a specified 
•quantity. 


We know that — 

cr n—l 


(10.74) 


i.e„ wS 2 /ct 2 is a x 2 with w— 1 degrees of freedom, where S 2 is the 
sample variance. The likelihood ratio technique will lead to the 
following tests. Reject H 0 if the observed value of nS 3 /a* when 

•cr 2 =c 2 is such that 


(a) 

2 

>x . ; 

(&) 


a, n—l 

1—a, n-1 


2 

2 

(c) 

>X ,0 

or <x 


a/2, n—l 

1 —a/2, n — l 


“where the points 


2 2 2 

^ ■, > X _ » X 

a, n-l a/2, n-1 1-a, n-1 

•are illustrated in Fig. 10.16. 


and 


2 

X 

1—a/2, n—l 





tE st of hypotheses 

Fig. 10.16 gives the distribution of a •/» with » 1 j 

freedom! X ““ * _1 de * K ™ <* 

Ex. 10.5.1. A random sample of 20 electric bulbs produced 
according to a special process, have an average life of 1000 hrs with 
a standard deviation of 10 hrs. Assuming that the life time of these 
bulbs has a distribution N(p, a) test the hypothesis fl 0 : c=9 against 

• G z' v dt tfl€> J /q t6V6i v 

Sol. ^i=20, 5=10, a 0 = 9 and oc=0.0l 


nS 2 /cr 2 : x 


n —1 


(10.75) 


This is a convenient statistic and when a=a e = 9 

ttS 2 /g 0 2 =— 'q QQ } =222.2. 

This is the observed value of a x 2 with u-l = 19 degrees of 
freedom. : g>9 and therefore H 0 is rejected if the observed 

value of the x\ 9 is greater than or equal to 


= X =36.191 

a, n —1 0.<U,19 


(obtained from a x 2 table) 


222.2>36.191, 


Hence the hypothesis H 0 : g = 9 is rejected in favour of 
: a>9. 

Comments. When there are a number of populations 

gi), N(/4 2j <t 2 ),.. N(jajfc, g fc ) for testing the hypothesis H 0 : cti= 

? 2 = -*- = CT ft against the alternative Hi : not all <y’s are equal, 
theorem 10.3 will give an appropriate test criterion. 

Let there be two random samples of sizes % and n 9 from 
two independent normal populations N(/*i, cri) and N(/* 2 , cr 2 ) res¬ 
pectively. Consider the following tests. 

H 0 : Gi =a 2 against the alternative Hi. 


(«) Hi : a' > a 2 ; (b) Hi : <r’ < a 2 

( c ) c\ a 2 . 


(10.76) 
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The likelihood ratio technique will lead to the following test*. 


Reject the hypothesis H 0 : 


= if 


(a) 


wi s i |( w i — i) 




( c ) 


2 

n 2 s 2 

/(«2~1) 

n, s\ f 

/(»*8- 1) 

2 

*i % 

/(«n-i) 

ni s\ 


n, 4 ^ 

/(» 2-1) 

n 1 S 2 

l 

f(n *-l) 


^ n 1 — 1, n 2 — 1 


(10.77) 


> F 


a, n, — 1, »j-l 


1 


> F a/2, re t -l,n a -l lf *1 ^ S 2 


> P „ . . if si > s? 

^ a/2, n 2 —\, n x — 1 2 ^ l 


ni ** /(wi-1) 

Here all the tests are based on the statistic 

n i S i/ («i-l) 


(10.79) 


F 


»!—li n f —1 


w 2 Sg / (7* (w a -1) 


»i j{ni — 1) 

»gS|/(Wg 1) 


when ct? = ct! 


(10.80) 


and all the test criteria are based on the right tail area of an F 
distribution. This is achieved because of the property that if 


X : F . , then -=r : F . 

n i —1» w a~l X 1, n x -l. 


F« n x — l n a —l i s f^ 16 P°i n f such that 


<30 


/(F)dF=a 


F 


a, n x — 1, n s — 1 


(10.81) 
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ypST 

where /(F) is ^ density function of an P . wit , 321 

degrees of freedom. with ^ an(J ^ __ [ 

Ex. 10.5.2. The diameters of two 
size 10, of bullets produced by two machinedC Sam P les oj 
lions si=0.01'and' ^=0.015. Assuming that the St J‘ ndard *«*»• 
independent distributions N(y, ai ) and " ie diameters have 

that the two machines are equally good by tesUna % . tU h VP^esis 
H\ • ^ o . cx—. (j 2 against 

Sol. *x=0.01, s a =0.0l5, »:=« 2 =lo, and let « =0 02 

_ / / \ 




oil/ \ * Fn!—1 i 

S |/(«a—1) 

Since *„>*, and H x : the hypothesis 

H 0 : (Ti=(T 2 is rejected when 


(10 82) 


Sg/l n 2 


-■) 


Wl Sj 




^ ^“/2, n s _l.nj-l — F 0.01,9, 9 


[see (10.79)] 


2.25 


n *4l(n 2 - 1) 

Here L ± _/_ (0.015)2 

i * 5 /(^ 1 ) w 

and t^e! f°’ 0i ' 9 V? =5 ‘ 35 ( obtained s fr om an F-table) Le., 2.25<5.35 

eonsidered tTbe^qu^good. a ° CePted ‘ The maohines ma * be 


Exercises 

assumed to^h ^iokness °f metal plates produced by a machine. 


' ' \ / ' ~ ’ i “ / U - 

■^(n, a), The weight of paper bags is assumed to have a distribution 

anita. TeR^fu ° ra sam P^e of 40 bags shows a standard deviation of 0.05 
the hypothesis H 0 : a = 0.03 against H x : o^sO.03, at the 6% level. 

[Hint. /irr- /_ 

*>80]. — V 2rt —1 is approximately a standardized normal when 


^ bo 


10.3!. 

ys ana 8 


The following data give the time spent by random samples of 
girls for solving a problem. Assuming that the samples may 
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, U frnm N(n, *t) and N(n a , *,) respectively, test the hy po . 
be considered to e rna tive a 2 ^o\ , at the 10% level. 

tho 8 i S .J=-!“8“ nstth6alternft * ‘ 


Girls. 20, 22, 18, 15, 16, 18, 20, 18. 
Boys. 18, 20, 20, 22, 16, 14, 12, 8, 10, 8. 


10.6. TESTS CONCERNING PROPORTIONS 


So far we have been considering the problem of testing hypo- 

theses concerniog the parameters of a continuous population. I n 

this section we shall briefly discuss the problem of testing of hypo, 
theses regarding the parameters of a discrete distribution. For 
convenience and simplicity we shall consider only a binomial 
probability situation ; that is, testing whether, a coin is unbiased, 
a particular drug reduces the mortality rate by 15%, the true pro¬ 
portion of defective items in a large shipment is 10% etc. The 
general problem can be formulated as follows : 

H 0 : p=Po 

Hi : (a) p>p 0 , ( b) p<p 0 , (c) p^p 9 (10.83) 

where p 9 is a specified value of the probability of a success in any 
trial of a binomial probability situation. Our tests will be based 
on the number of successes observed in a sample of n trials. The 
likelihood ratio technique will give the following tests Reject 
the hypothesis H 0 if 


n 

(a) x^x 0 where 2 f(x, n, p 0 )^a. 

x=*x t 


(10.84) 


(b) where 2 f(z, n, _p 0 )<a (10.85) 

aj—0 


(c) x^x 3 or <z a where S f(x, n, p 0 )^a/2 and 

a?=0 


n 

2 /(*,», 4?o)<a/2 (10.86) 

x=x 3 

where f(x, n, p$) is the binomial probability function with the 
parameters u and Pq, a is the probability of the type I error and 
x 0 , x lt x 2 x 3 are the nearest integers which satisfy the various 
inequalities in (10 84), (10.85) and (10.86). 


Ex. 10.6.1. Out of 20 babies born in a given hospital 12 are 

f* 1 7* Assu /nmg a binomial probability situation , test the hypothesis 

ini l ^J T ° bahlhl y of birth of a baby girl is 1/2 against the alterna¬ 
tive that it is greater than lj2, at the 5% level 
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Sol. Lit p be the probability of birth of a baby eir , 

^o=l/2, »=20,x=12. and a=0.05 ** 

Hi: p>Po and therefore H„ is rejected if *>,. „ k 

such that J 11 w ^ere * 0 

n 

2 /(*, n, p 0 ) ^0.05 
x =x 0 

From a binomial probability table for »=20 and p=l/2 
20 / 20 \ 

it is seen that ^ ^ j (l/2)>(l/2)»-. <0.03 


IS 


Therefore x 0 —15 and the observed value is 12. 
Hence the hypothesis is accepted at the 5% level. 


Comments. When the number of 
large, more specifically, when np>5 and nq> 
mation to the binomial is valid. In this 
based on the normal variable 


trials n is sufficiently 
5, a normal appr jxU 
case our test may be 


Z 


X —np 

V npq 


: N(0, 1) 


(10 87) 


where x denotes the number of successes in n trials and q = \~p. 
The test criteria for testing the various hypotheses in 110.83) are 
therefore, reject H 0 when 

(“) ; ( b ) *<-*« ; (c) | 2 | >z a/2 

where z and z a j 2 are such that, z is the observed value of Z and 



-t*/2 r 1 -t*j 2 

e dt=a and I e df = a/2 

Z /!> 
a/2 


( 10 . 88 ) 


The tvo sample problem (the problem of testing the differ¬ 
ence between p x and p z in two independent binomial situations 
with the parameters n lf pi and n 2 , p% respectively) can be treated 
! n a similar fashion. Tests concerning Jc proportions is discussed 
111 the next chapter. 


In all the problems discussed in this chapter we made 
decisions based on a sample of pre-assigned size n. This may be 
an unnecessary restriction. We might be able to make the same 
ecisions by taking fewer observations. Another drawback of 
e methods discussed so far, is that in certain problems three 
b e more than two possible choices. We considered only two 
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V • n imelv either accept the hypothesis or reject the hyn 0 
l°sis The process of decision making: when there are a num^ 
5 choices available is sometimes called the multiple decision 

problem. Instead of taking a sample of pre.asngned si ze and 

testine a hypothesis, we may decide to take additional observa. 
tions only after considering the information available so far ; this 
method is called a sequential testing procedure. For example we 
may start with a sample of size m. The null hypothesis is tested. 
Suppose that the choices are (1) accept H 0 , (2) reject Hq, (3) 
continue sampling. If our decision is to continue sampling we 
will take one more observation and test Hq. Depending upon 
the result of this test either sampling is continued or H 0 is accept- 
ed or rejected. The likelihood ratio test, modified to suit the 
three choices, can be used in the sequential procedure. For 
multiple decision problems and sequential procedures see the 
bibliography at the end of this chapter. 


Exercises 


10.32. In a binomial probability situation of 20 trials, if p is the 
probability of a success and if the hypothesis H 0 : £>=0.3 is tested against 
the alternative p = 0.2, 0.5, 0.7 obtain th? probability of the type II error if 
*=0.05, in each case. 

10.33. In a binomial probability situation of 15 trials obtain a test for 
testing the hypothesis H 0 : p=0.60 against the alternative H 1 :p>0.bO, if 
**=0.05. 


10.34 Out of 20 patients who are given a particular injection 18 
survived. Will you reject the hypothesis that the survival rate is 85% in 
favour of the hypothesis that it is more, at the 5% level ? 


10.35. A random sample of 1000 persons in a country shows that 550 
favoured a porucular legislature. Test the hypothesis that more than 50 % 
of the people favour the legislature, at the 1% level. ° 

women an 0 P in wn s urv e y conducted on random samples of 400 

that 200 women and 325 men ore in favour of 

trufnrouortfons a « in* “ P artlcuar P la °». Test the hypothesis that the 
^mefr/thi 40 ™level agam8t the a ‘t<™ative that th.y are not the 


With parameter^ #) m ? . raild ? m ™>Pl« of size 2 from a Poisson population 
the aiternTtive^H t ^‘“if «“ ‘ 68tin8 h ^° thesis H » ; ^ 


10.38. What must be the 
eating H 0 : X=1 against H, ; x=I 
copulation. 11 


size of the sample if a=0.05, ( 3 - 0.08 for 
5 where X is the parameter of a Poisson 


plots of three varieties^Twheli^f glV ° th ° yielda of independent 

b 2 J It Z'n I 6 ’ ' 2 ' 44j 46, 40 ' 38 ' * 2 ' 45 ' 47 ‘ 

C Oo J 40, 42, * 5 > 40, 39, 41, 44 4S. 

' 4j 46y 4 °' 44j 48 ' 60 > 62 > 51, 49, 46, 48, 50. 


IttUUV-- 


test 
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Assuming that these samples may be considered to be from a 1 ) l 

Y(M- 3 . °s)> ct s) respectively, test the following hypotheses at the 1 % level, 

(1) ir 1 =H- 2 =l A 3> °i = ct 2 = ct 3 = 2 (given) ; 

(2) G 1 r=CT 2= <J 3 l 

(3) [ 1 1 =|X 2 = k 3 , CTj = ct 2 =ct 3 = CT and a is unknown, 

10.7. SUMMARY 


A summary of the simple tests discussed in this chapter, is 
given in the following table. For convenience only two-sided tests 
are given except for problem 1. In problem 1, all the one-sided 
and two-sided alternatives are discussed. For other situations 
which are not discussed in the following table the likelihood ratio 
technique will help us to obtain appropriate test criteria. Single 
sample tests are based on a random sample of size n and two 
sample tests are based on random samples of sizes n\ and n 2 . 
The usual notations for the sample mean, the sample variance, 
etc., are used. All the tests are assumed to have a critical region 
of size a. In two sample problems the populations are assumed 
to be independent. In problems 7 and 8 of the table, S 2 is an 
unbiased estimator of ct 2 

5 2 = ^W! Sj -j-WaSg + w 2 —2) 

S’ = 2 (X 14 —XOVn 1 ; 

1 = 1 


n, 


s 


2 (X 2i —X 2 ) 2 /w 2 ; 
i = l 


n 


S' 2 = A 1 (X*—X) 2 /(w—1). 

i = 1 

If r is the sample correlation coefficient of a random sample of size 
n from a bivariate normal population and if p is the population 

correlation coefficient then — log e - is approximately nor- 

2i 1 — T 

mally distributed with mean /x,= -x- log* -j—- and with the stan- 

Li 1 P 

* This property enables us to carry 


dard deviation a = 


\/n —3 

out the test in problem 13 of the following table. 

K l -«/2 / n s . 

* (*)ri 


and 


x— 0 
n 
2 


a/2 


C) 


x n-x 

Vo So 


< a/2 


where g- 0 =l-j 9 0 and K 1 __ ot/2 and K a/2 are the nearest integers 
which satisfy the above inequalities. 
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CHAPTER 11 


CATEGORIZED DATA AND THE X 2 STATISTIC 

11 0 Introduction. In the last chapter we considered the 
problem*of testing various hypotheses regarding the parameters 
of a distribution, based on the observations on an observable 
stochastic variable. Sometimes we will be interested in testing 
the association between two attributes or in whether a 

particular distribution is a good fit to a data etc. Problems ot this 
pature are of practical importance. For example, suppose we 
have measured the heights of a random sample of university 
students. The data may be given as shown in the following 
table. 


Height 55—60 60—65 65—70 70—75 75—80 80—85 


Frequency n\ n 2 n 3 w 4 n 5 7? 6 


The data is classified into various classes and the number of 
individuals (frequencies) in the various classes are given. If we 
can find out the best fitting theoretical distribution to this fre¬ 
quency table we will be able to test various hypotheses regarding 
the distribution of heights of the university students under 
consideration. Such problems are called 'goodness of fit’ pro¬ 
blems. 


Study of association between attributes or variables is of 
some use in many practical situations. One may be interested in 
seeing whether there is any association between the heights of 
persons and their intelligence, habit of wearing a tall hat and 
longevity of life, colour of eyes and the weight of persons, etc. 
It a data is given in the form of frequencies falling into different 

then such adata is cal >ed « categorized data. 
These different categories may be characterized by measurable 

^ Quantities ^ tem 1 P eratu i re ’ height, length, etc., o'r non-measur- 
able quantities bke colour, state of existence etc. In the studv 

of categorized data a x* statistic is often very useful In thil 

chapter we will consider the use of a y2 atalUtie i„ 1 , • th ? 

categorized data T„ *i,„ . , , '■ sta listic in the analysis of 

be considered 6 neXt Chapter some °ther statistics will 
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11.1. GOODNESS OP PIT 


The problem that is considered in this section is only a 
JL case of goodness of fit problems. We will not consider 
P ar licence of a best fitting distribution to a given data, but we 
K Jlmine whether the data is compatible with a given theoreti- 
*7 distribution. In other words we will test the hypothesis that 
cal ai b0 cong idered to be the observed values of (or the 

lies assumed by) a stochastic variable having a specified dis- 
'bution. Consider a multinomial probability situation (that is 
n experiment resulting in h mutually exclusive outcomes, with 

probabilities pi, pt—'Pk 

k 

where . 2 Pt^ 1 - 

%=* 1 


Let Wi, be the observed outcomes in n independent 

trials (that is, n^nx+rii f... +n k ). The joint probability function 
of Wi. is 

/(»!. »„...»»)= Pi" 1 Pi**-?*** ( u l ) 

e { =E (ni)=np { and the maximum likelihood estimates of p { may 
be easily seen to be — [obtained by maximizing f(n 1 ,...n lc ) subject 

71 

to the condition S 2 ?, = 1]. 

Let us examine whether a specified multinomial distribution 
is a good fit to the observed data. This is equivalent to the 
problem of testing the hypothesis 

H 0 : Pi^Pio for i= 1, 2 ,..., h 

where p in j 9 the specified value of The likelihood ratio criter- 
10n ^ * n this case may be obtained as 

—2 log A=2 I n t log— ( 1L2 ) 

i—l e i 

^cording to the Theorem 10.3, -2 log* has a y} distribution 
i / e? rees of freedom equal to the number of parameters sneci- 
thp ^ t 1 ne nu ^ hypothesis ff 0 , when n is sufficiently large. ere 
k •. urn her of degrees of freedom is h —1 since T, pi — 1 a]R d on y 
(II 9\ Parameters are specified by F 0 - ~ 2 log A in the equation 
^ay he simplified to the form 
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The simplification involves some manipulations and the 
evaluation of Le limits when n-*oo and is left to the reader. 
Therefore we may state the following theorems. 


Theorem 11.1. If Wi, ih,<‘ 
and theoretical frequencies in a 
respectively, then 


, 3 n k and e lt e^,...e k are the observed 
multinomial probability situation 


k 

£ 

;=i 


s ^Observed freauency-E xpected frequency^ 
~ L ~J i -Expectedlrequency 

(11.4) 


is approximately distributed as a X 2 distribution with ^ 1 

degrees of freedom when n (the number of trials) is su cien y 

large. 

(A good approximation is obtained when k^5 and for 

i = l, 2,...k). 

Ex. 11.1.1. The- fishes caught by a man fishing at a certain 
spot in a lake are classified according to their weights in Ls an are 
given in the following table. 


Weight 

less than 1, 1—2 2 -3 3-i 4—5 oner 5 

frequency 

6 7 13 17 6 5 


Examine whether the data is compatible with the assumption 
that anyone fishing at this place will catch fishes in the ratio 1:1: 
2 : 3 : 1 : 1 in the various weight groups. 


Sol. Total number of fishes caught = 7&=54 
The expected frequencies in the various weight groups are 
6, 6, 12, 18, 6, 6. 


Observed frequencies ni = 6 n 2 = 7 n^— 13 n 4 =17 715=6 n 6 -5 


Expected frequencies «i = 6 e a =6 e 3 = 12 e 4 = 18 e 5 -6 r 6 = 6 


&=the number of classes =• 6. 

x 2 =x!= 2 

*-l i = l «t 

=0/6-f 1/6+1/12 + 1/18 + 0/6 + 1/6=0 47. 

Comments. 0.47 is less than the tabled value of a X 2 with 
0 degrees ol freedom at 6% level. The observed X 2 is not in the 
critical region and hence we will accept the hypothesis that the 
data is compatible with the assumption that the ratios of the 
various weight groups are 1:1:2:3:1:1. In a problem if one 
class frequency is less than 5 then it may be combined with the 
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• intr class so thab the combined class frequency mav be 
adj °^r than 5. It may be noticed that the length of the class 
great*i the un its of measurements etc have nothing to do with 
* nt<3 v 2 statistic. The same X 2 statistic may be used to test the 
the thesis that p 1 =P 2 = — =Pk where p Xi p 2l ..., p lc are the popula- 
by:^proportions in k independent binomial populations. 


Tf a theoretical distribution is fitted to a given data and 
parameters are estimated while fitting the distribution, then 
, 1 y 2 statistic of equation (11.4) may be used to test the goodness 
ffit but the degrees of freedom will be (k-l)-(t) where k is the 
dumber of classes. 


Ex 11.1.2. The accident rates on a particular highway 
. in the followinq table. Examine whether the data may 
ft assumed to follow a Poisson distribution. 


Number of accidents 


0 1 


2 3 4 5 6 


Number of days 
( frequencies ) 


25 30 20 15 5 3 2 


Sol. We want to examine whether the distribution 



X x ^ 
x ! 


is a good fit to the data, where x denotes the number of accidents. 
A is to be estimated. The sample mean 2 may be used as an 
estimate for A. 


*=[(0)(25) + (l)(30) + ... + (6)(2)]/100=1.62 
we want to examine the goodness of fit of 



- 1.62 

e 


The expected frequencies for various values of x are obtain 

“y evaluating 


100 . 


11 62)a L .— 1.62 for x — 0 , 


X 
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Number of 
accidents (a?) 

Frequencies 

i w i) 

Poisson 
probabilities 
f(x)for X—L62 

Expected 

frequencies 

( e t) 

1 - 

0 

25 


0.2019 

20.19 

1 

30 


0.3230 

32.30 

2 

20 


0.25S4 

25.84 

3 

15 


0.1378 

13.78 

4 

/05 1 

I 

0.0551 

5.51 's 

5 

3 

M0 

0.017-6 

1.76 L 7.74 

6 1 

2. 


0.0047 

■«J 


The expected frequencies for x - 6 and for x = 5 are less than 
5. Further e 6 + e 5 <5. Hence they are combined with e 4 . There- 
fore k =5 and the number of parameters estimated = 1 . 

v 2 _,/ 2 _ v ( n i — e i ) 2 

- (25-20.19) 2 /(20.19) +(30-32.30) 2 /(32.30) 

+ ... +(10-7.74) 2 /(7.74)=3.09. 

The tabulated value of a X 3 at 5% level is greater than 3.09 

and hence the observed x 3 does not fall in the critical region. 

The hypothesis may be accepted. A Poisson distribution with 
A = 1.62 can be assumed to be a good fit. 

Ex. 11.1.3. The bust measurement of 80 women are given in 
the following table. The mean and the standard deviation of the 
measurements before the observations are classified, are 35 and 2 
respectively. Test the goodness of fit of a normal distribution to this 
data. 


Measurements 30 or less 31—32 33—31 35 — 3o 37—38 39 or more 


Frequencies 8 12 15 20 15 10 


Sol. The various classes may be assumed to be less than 
30.5, 30.5 to 32.5, 32.5 to 31.5 etc. The expected frequencies in 
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the interval 30 or less = 80 times the probability that a nnrmai 
variable with the parameters /* =35 and cr=2 falls below 30.5 


30.5 


30.5-35 


-80 [' 

J2V27T 


(*-35)* r i 

2(4) dX = ^\^ e~~ t2/2 dt 


00 


CO 


=80(0.0122) =0*976. 

The total frequency in the class 32.5 or less 
32.5-35 


= 80 j 


e ~ t2 ! 2 dt = SA 48 

V 


The frequency in the class 30.5 to 32.5 
= 8.448-0.976=7.472 etc. 


Glass 

intervals 

Observed 

frequencies 

Cumulated 

frequencies 

Expected 

frequencies 

30.5 or less 

30.5-32.5 

S ) 

20 

12) 

0.976 

8.448 

0.976 ' 
7.472 

. 8.448 

32.5—34.5 

15 

32.104 

23.656 


34.5—36.5 

i 

20 

61.872 

29.768 


36.5-38.5 

155 

76.792 

14.920 i 


38.5 or more 

**"_ 

>25 

io) 


1 

3.208 J 

. 18.128 


4 and 2 parameters are estimated. Hence 

=/-i =£+^-==(20 - 8.448)7(8.448) 

+ [(15-23.656) 2 /(23.656)+(20—29.768) 2 /(29.768) 

4 (25-18.128)2/(18.128) > 6.635 
G.635 is the tabulated value of a yfc at 1% level. Hence 
ttle hypothesis is rejected. 


Comments. Here& = 4<5. 

^ * e i) 2 /e i to a yf is not a good 


Hence the approximation of 
approximation. From these 
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,, n + thp applicability of a x 2 test in testing 

examples it is seen ^ tQ t ] ie need of classification, the 

goodness ot nt ^ Some exa ct tests will be considered 

in the next chapter. 


Exercises 


of fit of a 


f „,lowintlaJomtorom|Yan'experimen t of roiling 
(The faces of the die are marked 1, 2,...6). 


multinomial distribution to the 
a die 50 times, 


Face numbers 

1 

2 

3 

4 

5 

6 

Frequencies 

7 

8 

9 

8 

8 

10 


Assume that Pi=P 2 =-" = 

11.2. A historical monument is visited by 1000 people on a particular 
day. The exact categorization is given below. 

Visitors from North Amer, South Amer. Europe Africa Asia 

Frequency 400 50 250 100 200 

. --- - ' —__ kJ I 

Is the data compatible with the assumption that the ratios will be 
4 : 1 , 2:112 respectively. 

11.3. The telephone calls received at an office switch-board are count¬ 
ed at every minute and the following data is obtained. 

Number of calls ( x ) 01234&67 

Corresponding Number 
of one minute 

intervals 40 60 50 30 20 10 3 1 


Test the goodness of fit of a Poisson distribution to the data. 

11.4. The following is the classification of the families in a township 
according to the milk consumption. 

Consumption of 

milk 5—10 units 11 — 15 16—22 23—27 28—32 33 or more 


Number of 

families 200 180 170 140 100 3) 


Test the goodness of fib of an exponential distribution to the data. 
The average consumption=18 units (calculated before classification). 


11.5. The marks obtained by 100 students have a mean 70 with a 
standard deviation 5. The marks are classified and then given in the 
following table. 


Marks 50 or less 51—60 61—70 71—80 81—90 91—100 

Number of 

Students 5 10 30 25 20 10 


Is the data compatible with the assumption 
mally distributed. Can you generalize your findings 


that the marks are nor- 
? If not, why ? 
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11.2. CONTINGENCY TABLES 

The following is the data of waist measurements of 100 
corner! classified according to their intelligence. 


Measurements / 
Intelligence 

Very 

intelligent 

Average 

Below average 

16 or less 

10 

8 

8 

17—18 

6 

7 

8 

19 -20 

7 

8 

7 

21-22 

5 

6 

6 

23 or more 

5 

5 

5 

A 

y 


The numbers in the various cells are the frequencies, i.e. the 
number of measurements falling under the corresponding charac¬ 
teristics of classification. If the data is classified according to 
two or more characteristics (measureable or not) and is given in a 
frequency table then such a table is often called a contingency 
table. The study of association between two characteristics of 
classification is of some practical interest. For example we 
would like to test whether there is any association between heights 
and intelligence, oratorical talents and the quantity of food con¬ 
sumed, aptitude for Mathematics and the interest in games, etc. 
A X 2 statistic can be conveniently used to test the independence 
of the classification in a two-way contingency table. Let Ai, A 2 ,... 
Ar and B lf B 2 , ... B s be the categories of the two characteristics 
under consideration and let the data be given as shown in the 
following table. 


1 B, 

Bo 


' ». 

Totals 

Ax 

n lx 

*12 


n is 

n 1% 

A a 

W 21 



n 2 * 

n t- 

• 

• 

• 

• 



a 



a 

a 


• 

• 

• 

• 

A r 

n rl 



n rs 

n r% 

Tots 1 

n. 2 

n. z 


n. s 

n . . 


^ • _ 

and B 1 / number of observations (frequency) corresponding to A { 
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r 

• i o 41 i /^ 2 2 5 5 £ 'M'ij ^'•y * 

*== If A •••> r W * 


r s 


2 mj—Ui. ; E E riij^n . . 

j= i »=i 

In the above notation, the summation with respect to a 
suffix is denoted by a dot. 

Let pa be the probability of getting an observation in the 
(ij) ih cell ( i th row, j th column cell), then 

S 

E p i} =pi.—thQ probability of getting an observation in the 

3 = 1 

i iA row. 

r 

£ Pij—V j the probability of getting an observation in the 
i=l 

j th column. 

If the classification is independent of one another then 
Pi;—Pi-P-j- i= 1, 2, ... r and j = l, 2, ... s. 

Consider the hypothesis, 

H 0 : Pa^Pi. p.j 

(There is independence in the classification.) 
Hi : Pijj^Pi. p j at least for one i and j 

(There is no independence.) 
Under H 0 expected frequency in the (ij th ) cell is 
e a=(Pi- P-j)n . . 

Since p^ and p.j are unknown they may be estimated by 


Hence 


*>.- and 

n " n.. 

e {i = — . 1 hiJ L _i-n L 

n “ n.. n.. 


E r (^tj—e<j) 2 __ 2 

i -l j =i eu ~ Z (r _, )(tl) approximately. 

minus the number^VindeXiden^* nUmber of ol »sses minus one 

oi independent parameters estimated 

r 

Since 2 y 

*= l P'*’ on y r + 2 independent 
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ameters are estimated. The approximation is good if all the 
? aI > 5 and rs > 5. 

r t 

Ex. 11.2.1. Check whether the data given in eection 11.2 are 
mvatible with the assumption that there is no association between 
C -° n t e lUgence and waist measurements in women. 


Sol. The data is reproduced in the following table. 



Bi 

B, 

b 3 

Total 

i 

10 8.5S 

8 8.84 

8 8.58 

26 

A 2 

j 

6 6,93 

7 7.14 

8 C 93 

21 

A 3 

1 

7 7.26 

8 7.48 

7 7.26 

22 

A 4 

5 5.28 

«o.44 

5 5.2S 

16 

! a 5 

5 4.95 

°5.10 

5 4.95 

16 

Total 

33 

34 

33 

100 


eu = (26)(33j/100=.8.58 
ei2=(26)(34)/100=8.84 etc. 


The estimated expected frequencies are given at the corner 
of each cell 

(r-l)(s-l) = (4)(2)=8. 
x! = s 1 8.58)«/(8.58) 

i= 1 j = 1 

-J-(6—6.93) 2 /(6.93)-f- .•• + (£>—4.95) 2 /(4.95)< 15.507 

where 15.507 is the tabled value of a x 2 with 8 degrees of freedom 
at5% level. Hence we will accept the null hypothesis that there is 
no evidence of any association between the waist measurements 
and intelligence, based on the data given. 

Comments. Even if our null hypothesis of independence 
is rejected we cannot generalize our results. We can only say that 
the data is not compatible with the hypothesis. 

Analogous to the correlation coefficient between two vari¬ 
ables some measures of association between the characteristics of 
categorization in a two-way contingency table, are suggested. 
°me of the commonly used measures are : 
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(1) Square contingency 


X 2 =w. 


r s 
2 2 
i=i j =l 



(2) K. Pearson’s coefficient of contingency P 



where x 2 is the X 2 calculated from a contingency table under the 
assumption of independence and n.. is the total frequency. 
Evidently P=0 oX 2 = 0 and 0<P<1. When there is complete 
independence P=0 and vice versa. 


Exercises 

11.6. The following table gives the categorized data classifying 100 
people according to their intelligence and their mood upon getting up on a 
particular morning. Is there any evidence of association between these 
characteristics from the data given ? 


Mood Intelligence 

Highly intelligent 

Intelligent 

Average 

Below 

Average 

Good 

15 

12 

10 

5 

Tolerable 

10 

10 

5 

5 

Intolerable 

5 

10 

8 

5 


Calculate a coefficient of contingency. Can you generalize your 
inference to all the people in a country ? 

11.7. The following are the data obtained from an experiment con¬ 
ducted to study the association between the power of concentration (measured 
in terms of.time units) and the ability in Statistics 


Power of Concentration 




1 hour or less 

$ 

1—2 

2—3 

' i 

3—4 

4 or more 

Ability 

in 

High 

6 

6 

7 

9 

10 

Statistics 

Avera ge 

5 

1 

6 

6 

6 

8 


Low 

5 

8 

9 

5 

6 


Test for independence in this classification at, 5% level. 
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11.8. The following is a classification of £5 people according to 
their heights and weights. Test for independence in the classification. Also 
calculate the Pearson’s coefficient of contingency. 


Weights 

Heights 

55-60 

61-65 

66 or more 

120 or less 

10 

8 

5 

121-135 

8 

9 

10 

136 or more 

5 

6 

14 
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CHAPTER 12 


ANALYSIS OF DISPERSION 


12.0. Introduction. In the last few chapters we considered 
some of the problems of statistical inference, namely, estima¬ 
tion and testing of statistical hypotheses. In this chapter we will 
present a unified theory for estimation and testing procedures. The 
concept of a statistical population, either defined by a set of given 
data or by a stochastic variable, is already familiar to the reader. 
We will define a measure of dispersion in a univariate population. 

12.1. A MEASURE OF DISPERSION IN A GIVEN DATA 

Let Xi, be the elements of the given population. A 

measure of scatter of the elements from any point of reference (a 
point of location) may be defined by the following axioms or 
desirable properties. Let m be a point of reference and let 

d i =x i — m for i = l, 2 ,...n 

Any function D of d 1} d 2 , ...d„, satisfying the following 
conditions can be taken as a measure of dispersion in x±, x 2 ,...,x n 
from m. 

a i D(di, d 2i ... d n ) >0 and D=0 o d t = 0 for ?'=1, 2,...n 

«2 D (adi, ad 2 ,...ad n ) — | a | D (d\, d 2 ,'.-d n ) where a is a scalar 

quantity and j a | is the absolute value of a. 

a % D{d 1 -\-f 1 , d 2 +/ 2 ,..., rf n -f/ n ) < D(di, d 2 ,...d n )+D(/i, / 2 ,.../„) 
where (/i,/ 2 ,.../ n ) is similarly constructed from another 
population (y lt y 2) .„y n )\ 

«4- 0(6!,..., 6 W )=1 when | 6< | =1 for i= 1, 2 

Axiom «i suggests that the measure is always a positive 
quantity and is equal to zero if and only if all the jci,..., x n 
coincide with m. a 2 says that if the elements are scaled by a 
scalar quantity a, then the measure itself is scaled by the magni¬ 
tude of a. Axiom a 3 is equivalent to the statement that the 
dispersion of a sum is less than or equal to the sum of the disper- 
sions. If all the elements are one unit away from m we would 
like to have a measure of scatter also equal to unity or if the ele¬ 
ments are c units away from m we would like to have the measure 
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aN alysis- 

c. Axiom together with a 2 gives this rsuult. The following 
t0 gome of the examples for such a function. 


Di=j (W \ d *\ r /r forr>l. 


D a =max | d { 


( 12 . 1 ) 

( 12 . 2 ) 


=|^ Ci +•••+ c„ dn J ^ where c*>0 for i= 1, 2,..., n ; 
Ci + ...-fc« = l. (12.3) 


Djl for r = l equal to (1 fn) | x—m | . 


(12.4) 


This is the usual measure of mean absolute deviation 
from on. 


Di for r=2 is equal to £ (1/w) | x 




(12.5) 


This is the usual measure of standard deviation when on is 
the arithmetic mean of the re’s. The constants Ci, c 2 ,...c n in D 3 
satisfy the conditions for a probability measure and hence we can 
extend the concept of dispersion to a population defined by a 
stochastic variable. 

Let a population be defined by a stochastic variable X' ; let 
m be a point of location so that 

X=X'-m. 

A measure of dispersion D in this population, from the 
point m, may be defined by the following axioms :— ! 

a i . D(X)>0 and D=0 o X = 0 almost surely. 

a i D(aX)= | a | D(X) where a is a scalar quantity. 


a 3 •H(X + Y)<D(X) fD(Y) where Y=Y '—m and Y' repre¬ 
sents another population. 

®4 • D(X) = 1 if j x | =1 almost surely. 

■^ e following are some of the functions D which satisfy 
Editions a[ , a‘ a ' and a' x : 
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p 5 =sup | X | 

X 


for 1 


(12.7) 

( 12 - 6 ) 


where Ex denotes mathematical expectation with respect to X 
and sup means the supremum or the maximum of | X | . 


A number of other measures may be constructed in a similar 

way. 

12.2. THE PRINCIPLE OF MINIMUM DISPERSION 


This is a general principle which enables us to get some 
criteria for estimation of parameters and testing of statistical 
hypotheses. Suppose that a parameter 6 is estimated by an esti¬ 
mator 6, then 0 — 0 may be taken as an error in the estimation. 

If there exists an estimator which minimizes the dispersion D(6 0) 

then such an estimator may be called the best, in the sense of 
minimum disperision. In general if 0 designates a correct situation 

and if 0 designates a statistic designed for 0 and if 0 is evaluated 

by minimizing any measure of dispersion D(0— 6) the principle is 
called the principle of minimum dispersion. This principle will be 
illustrated in the following-sections. 


12.3. THE PRINCIPLE OF LEAST SQUARES 

Let y x , y 2 ,...y n be a given data and let y 10> y g0 . y nQ be the 

hypothetical values of 2/i,—2/ n respectively, then e^yi—y^ may be 
called the error in the i th observation, for i=l, 2,... n. If we 
take a measure of dispersion as Di for r=2, that is. 



( 12 . 8 ) 


and if the unknown values denoted by 0, are estimated by mini¬ 
mizing D, then the estimation procedure is called the principle of 
least squares. For example let a - !, x 2 ,...,x n denote the height measure¬ 
ments of n persons and let y lt y 2 ,...,y n oe their weights. If we 
assume that there is a linear relationship between the height and 
weight such that if the height is given the weight can be estimated 
by using a relationship of the form y=a-\-bx where y and x denote 
weight and height respectively. Here our hypothetical model 
under the assumptions of linear relationship between x and y is 
a+bxi for y { . The error is yt—a—bxj where a and b are unknowns. 
We can estimate a and b by the principle of least squares 


D = 





n i=l 


(: Vi-a-bxi ) 



(12.9) 
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Minimization of D with respect to a and b is equivalent to 

minimization of 


n 


( 12 . 10 ) 


L = Z {Vi-a—bXi ) 2 

t-1 

By using the principles of calculus 

0 L n , 8 L 

——=0 and—j—=0 
0a db 

corresponding to the maximum or minimum values. These equa¬ 
tions are called the normal equations. 


n 




|^=0=>-2 2 (y i -a~bx i )= 0 

da t=l 


n 

=> 2 (y i -a-bx i ) = 0 
i =l 


( 12 . 11 ) 


~r n n 

°=-=0, => -2 S *,(y,- a-bz t ) = 0, => £ x^-a-bx^Q 

8b t=l 


Equations (12.11) and (12 12) yield, 

n » 

S yi — na—b 2 x t = 0 
t=l *=l 

n n n 2 _ 

2 x t yi~a S x^—b 2 x t =0 
i=l t=l t=l 


(12.13) 

(12.14) 


A 

a=y—bx — 


, ? 2x i y i /n—'% y 
and o= —- 

2 x 2 Jn—x 2 


Cov (a, y) 
Var (a;) 


(12.15) 


where a and 6 denote the estimated values and Cov (x, y) and 
Var (a;) denote the observed sample covariance and the variance 

of X, respectively. It may be easily seen that a and 6 minimize L. 

Similarly if we have a set of observed values y\, 2/a»• - 2/« and 
hypothetical values 6i (d), (f*n (0) respectively and if the 

parameters in (f>(6 ) are estimated by minimizing 

L = 2 (OT ( 12 * 16 ) 

4=1 

with respect to the parameters, the principle is called the princi- 
Pie of least squares and the corresponding estimates are called the 
e ast square estimates. 
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Ex. 12.3.1. Assuming a linear relationship of the form 
y=a-\-b% between heights and weights, fit a straight line to the 

following data. 


Height ( x ) 

64 

65 

63 

66 

67 


Weight ( y) 

125 

130 

120 

140 

150 



Sol. If a + 6 Xi is the value set up for y { the error 

e i = y i — a—b x t 

MinimiziDg the dispersion in e* for i=l, 2, 3, 4, 5 by mini¬ 
mizing the dispersion Di for r =2 or by using the principle of least 
squares, 

L = S (yi-a-b XiY (12.17) 

*=1 


and 


dL —q— 

0a db 


^ A __ A 

=> a = $—b x and b 


Y. x.yjn Hy by the (!quation -> 
Z x\!n-& 1 (12.15) J 


But £=(64-j-65-j-63 + 55-|-G7)/5 = 65 ; y = 133 

2 xi yjn—xy=15 


Zx 2 n'x 2 — 2, 
i I 

6=7.5 5 = — 354.5 

The estimated equation is 

y=— 354.5+7.5«. 

Comments. If more observations were available the 

estimates a and 6 would be different. If there is an exact linear 
relationship y=a ±bx, the two parameters may be evaluated by 
using two pairs of values for x and y and if we substitute any 
other pair of values for a; and y the equation should be satisfied. 
Under our assumption of a linear relationship we can only esti¬ 
mate the parameters. Our assumption may be tested by using a 
‘goodness of fit’ test. 

Ex. 12.32. Fit a curve y=ab x to the data in Ex. 12.3.1. 

Sol. 2/=a6®, => log 2 /=log a-f a; log 6 (12.18) 
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y=A-f B* where Y = log y, B = log 6 and A=log a. 

1 By using the results in the equation (12.15) (12.19) 

A=Y-m and B = —- (12.20) 

£ r* jn—x 2 

^=0.5429 B =0.0243 

5 = 1.3490 b =1.106 

• • 

/. The estimated equation is y = (] .3t90)(1.106)* 

Comments. From these examples it is clear that any 
curve may be fitted to a given data., The goodness of fit depends 
on the assumption we make. Then our assumption, that the 
relationship is of a particular nature, can be tested. Some more 
test statistics for 'goodness of fit’ will be introduced later. If 
our assumption is acceptable, we can predict y based on any 

observed x ; i.e., y 0 = a-\-b x ° where x 0 is the observed x, and yo is 
the predicted value of y. The error in this prediction depends on 
the validity of our model, the error of measurement in x Q etc. 

12.31. The scatter diagram. If we have a data of paired 
observations on two variables x and y (for example, height and 
weight measurements of n persons) we can plot the points (x , y) in 
a two dimensional space. Such a diagram is called a scatter 
diagram. An illustration is given in Fig. 12.1. 



Fiom such a diagrammatic representation it is easy to obtain 
8 °me idea about the best fitting model to the data. In Fig. 12.1. 
a straight line seems to be a good fit. By the principle of least 
scares we minimize the sum of the squared distances of the 
Points from the curve and estimate the unknown quantities in the 
odel. The goodness of fit may also be tested by assuming some 
va° -W u ^^ibution f° r th e error in the model. If there are three 
ria bles the scatter diagram lies in a three dimensional space and 
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on h variables the scatter 
in general if there are ob crkflf . P An application of the 

squares, will be considered in the next section. 


Exercises 

12.1. A statistical population is given by the Obtain 0 'the 

22, 27, 30, 35, 32, 31, 30, 26 , 26. 27 29, 24, 23, 22 20 2 b 23 26. ^ Obtam th. 

following measures of dispersion from the point 25. (1) \ ) 1 

for r=2. (3) D 2 (see section 12.1). 

12.2. If a statistical population is defined by the stochastic variable 
X with the density function. 


{; 


f(x)= fe~ x for x>0 
elsewhere. 


obtain the following measures of dispersion from the point zero ; 

(1) D 4 for r= 1, (2) D 4 for r=2 , (3) D B (see section 12.1). 

12.3. Obtain the least square normal equations in each case if the 
following curvi s are fitted to the data. (1) y=a + b(x—£), (2) y *=a-\-bx-\-cx x , 

(3) y—ab x , (4) y=ax-\-bjx, (5) xy° = b, (6) x=aye' bv , (7) x—e~ a ^~^ 3 ~ c ’ 
where a, b, c are constants and the data is given as follows : 


y 

Vi 

2/2 

. 

Vn 

X 

X 1 | 

*2 

. 1 

x n 


12.4. Fit the curves, (1) y=a-\- bx+cx 2 , (2) y^ax+bjx, (3) x^aye' by , 
to the following data. 


y 

10 

12 

13 

15 

17 

i 18 

1 

19 

21 

2z 

24 

X 

8 

9 

10 

n 

13 

15 

10 

18 

19 

20 


12,5. Draw the scatter diagram for the data in problem 12.4. 

12.4. LINEAR REGRESSION 

This is a simple problem where we can apply the principle of 
least squares to estimate linear relationships between stochastic 
variables. If X and Y are two stochastic variables, the conditional 
expectation of Y given X, that is E(Y | X) is called the regression 
of Y on X. Similarly E(X | Y) is called the regression of X on Y. 
These regressions need not be linear. In section 5.32.4 we had seen 
that if X and Y have a joint normal probability distribution then 
the regressions E(Y | X) and E(X | Y) are linear. If E(Y[ X) is 
linear then Y is said to be a linear regression on X and vice versa. If 
A 1 ,...,X n are n stochastic variables then E(Xi | X 3 ,..., X B ) is called 
the regression of X x on X 2 , X n . If this multivariate linear re¬ 
gression is linear then it is called a multivariate linear regression. 
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0Al# SlS 

Multivariate regression of X,* on the other variables, for i=l. 

n may he simi arly defined. In this section we will consider 
linear regressions, that is, the cases when E(X x I X 2 , ...,X n ) etc., 
are linear. For example .a linear regression of Y on X may be 
written as, 

E(Y | X)=a-j-6x (12.21) 

w hcre a and b are constants which are also called the regression 
coefficients. In general a linear regression of X lf on X 2 . X 3 , 

X may he written as 

E(Xi 1 X 2 , X 3 , X n ) = a 1 _i_a 2 a; 2 -|-a 3 x^-\ (12.22) 

where a>\, Q> 2 > •••> a « are constant regression coefficients. ‘Condi¬ 
tional expectation 5 is discussed in section 5.23. 

Theorem 12.1. If X and Y are two stochastic variables 
then E(Y) =E x [Ey(Y | X)], where E x and Ey denote expectation 
with respect to X and Y respectively and the conditional expecta¬ 
tion Ey(Y | X) is treated as a function of X. 


Proof. Let X and Y be continuous and let f(x, y), /( x), g(y) 
be the joint density and marginal densities respectively. 

E y (Y | X)= | y. h(y \ x)dx= J V^j—dy. 
v y 

where h(y\x) denotes the conditional distribution of Y given X. 

J V J( x > V)dy (12.23) 

y 

E x [E y (Y | = y f(x, y)dy J f(x)dx 

x y 

= j j yf(x,y)dydx 

X y 


=j {(/(*. 


y)dx dy 


y 


x 


= \ y 0(2/)%=E(Y). 


(12.24) 


y 


is d■ pr °° f when X aiid Y are discrete or when one variable 

‘screte and the other is continuous, is left to the reader. 
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Theorem 12.2. If the regression of Y on X is linear, that 
is, if E(Y | X)=a-\-bx, then 


a —ii 2 — 


<*12 


(*! and b 


<*12 


where fi 1} a\ and p 2 » are the means and variances of X and 
Y respectively and <j 12 is the covariance between X and Y. 

Proof. E(Y) = E x [E y (Y | X)] (Theorem 12.1) 

= E x[&+&X]. 

i e > fi 2 =a + b ?!. (12.25) 

E(XY)=E x [E y (XY 1 X)]=E x [XE y (Y I X)] 

= E[X(a+6X)]=aE(X) + 6.E(X 2 ) 
i.e., E(XY)=a . /* x + 6 . E(X 2 ) (12.26) 

Solving (12.25) and (12.26), 

2 

°1 

(since a 12 =E(XY)- / x 1 / i 2 
and = E(X 2 )-p* J 


Pi) ( 12 . 27 ) 


a=n2 — 


<*J2 


Pi 


Hence E(Y | X)=y 2 +^- (y- 


Similarly E(X | Y) = /x,-f-—“- (y~y 2 ) (12.28) 

12.41. Least Square Estimation of the Linear Regres¬ 
sion Equations. Assuming that the regression of Y on X is 
linear, we can estimate the regression coefficients by setting up a 
model a+ bx for y or by letting 

y = a + bx -f e (12.29) 

where c is the error in the model, ct and b may be estimated by 
the principle of least squares. These estimates are given in (12.15), 
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a ka^ sis 


3=y-«E «nd6-£y {x ’ y ± . 

Var (x) 


The estimated equation is 

_ Cov (x, y) 
y - y= -V ar(x) 


(12.30) 


If there is linear regression of X on Y or if 
E(X | Y) = c-\-dy 

where c and d are constants, we can estimate the constants by 
setting up a linear model for x in the form 

x=c-\-dy+e' (12.31) 

where e' is the error in the model. By applying the principle of 
least squares, c and d may be estimated. These estimates are 
easily seen to be 


?. _ , Cov (*. y) 

Var (y) 



and 


: _ Cov ( x, y) 
Var (y) 


(12.32) 


The estimated equation is. 


x- 


Co v {x, y) 
Var (j) 


{y-y)’ 


(12.33) 


If there are more than two variables and if there is linear or 
non-linear regression the principle of least squares may be used to 
estimate the regression coefficients. If we assume distribution for 
e in the equation (12.29) we can construct confidence intervals for 
the regression coefficients. 

For example, if e* is assumed to have a normal distribution 
N(0, o) for t=], 2, n then the likelihood function 


n 


T n 

*2, ••• e n ) = II f(€i) 

1=1 


-£ 6x f )*/2cr* 

i - 1 


a n (s/2n) n 


« and b can be estimated by using the principle of maximum like- 
uiood. Confidence intervals for a and b can be obtained from the 

distributions of a and b. The assumption of e* : N(0, a) in the 
quation (12.29) is equivalent to the assumption that 


Y,-: N(a-h6^, a) and are constants. 


joint f Xm l 2,41 * 0btain the regression of X on Y and Z, where the 
density function of X, Y and Z is given as, 
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/(*, y. z)=i (x+y)e-‘for 0<x<l, 0<y<Z, z>0 
= 0 elsewhere 
1 

Sol. f(y, z) = j f{x, y, Z) dx 
o 


x 

j (*+y) e ~ 9 (2/+ 1 / 2 ) 


(12.34) 


(x +?/) 

f(x | y, z)=f(x, y, z)/f(y, z) = £ ( y +l/5j~' 

The regression of X on Y and Z is E(X | Y, Z) 

1 1 

= [ *■/ (* I y> z ) dx = sJ+1/27 

0 0 

_i_ 2 

— 9 1 -+- 

Therefore the regression curve is 

WX I Y Zl—— ■ + 

E(X | x, /ij- 9 ( i + 2 y) 


(12.35) 


(12.36) 


(12.37) 


12 .5. MINIMAX PROCEDURES 

The following is a note on minimax procedures. We have 
seen that if the parameters are estimated by minimizing D x for 
r — 2 (see section 12 1), the corresponding procedure is called the 
least square method. If xi,...,x n are the observed values with 
y x0 ,..., y ne being the corresponding hypothetical values and if the 
parameters are estimated by minimizing the maximum error 
(min D 2 or min max | x t —y iQ | see section 12.1), the estimation pro- 
0 Q i 

cedure is called the minimax procedure If 0 is selected as an esti¬ 
mator for 0 then 

Eg | 0-0 |, {Eg | 0-0 | pp/pfor^l, etc. 

are some of the measures of dispersion in 0—0, where Eg denotes 
the expectation with respect to the stochastic variable 0. The 
maximum dispersion can be denoted by max D(0 —0) orsupD(0 —0) 
„ 9 d 

where D(0 - 0) is any measure of dispersion in (0 — 0). For 
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different estimators sup D(0 - 0) may be different. If an estimator 

u 

0 is such that it minimizes the maximum dispersion, that is, sup 
J)(0 — Q) i s a then 0 is called the minimax estimator, 

and the corresponding procedure of selecting an estimator 6 is 
also called the minimax procedure. Minimax estimators need not 
always exist. 

Ex. 12.5,1. 1, 3 and 6 is a random sample of size 3 from a 
population f(x, 6) — ljd for 0<x<6 and is zero elsewhere. Obtain a 
minimax estimate of Q . 

Sol. Under this model the expected or the hypothetical 
value of any observation is 


E(X) = 


o 


x dx/d = d/2 


Hence we can conveniently take the model, 

£i = 0/2-}-e 3 - where e t is a random error. 


Observed values (x t ) 

1 

3 

6 

Hypothetical values (jqe) 

0/2 

0/2 

0/2 


A minimax estimate of 6 is that value of 6 for which max 

i 

| Xi — 6/2 | is a minimum with respect to 6. Since all the obser¬ 
vations should be between 0 and 6, it may be considered that the 
true value of 6 is greater than or equal to 6. Let us examine 
max | Xi—6/2 | for values of 6^ 6. 

t 

d 6 7 8 9 

max | Xi—6/2 | 3 2.& 3 3.5 

When 0=7, max | Xi—6/2 | is a minimum and hence 0=7. 

Comments. It can be verified that for 6 between 6 and 7 and 
7 and 8, max | x { —6/2 | is greater than 2.5. The least square esti- 
i 

ttiate of 6 is easily seen to be 6.67 and if we minimize the disper¬ 
sion for r=l (see section 12.1) the estimate is 6. The maximum 
likelihood estimate is also 6. Due to lack of mathematical ele- 
| a nce the minimax procedure becomes difficult. A reader who 
k&ds some difficulty in following the arguments in this section 
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section. For other interpretations of the mmimax 
may omit thi~ section. procedure as a minimization of 

procedures sM°h as tr^ t g ad P ision iB taken (an estimator is 
the maximum nsK wn parametric function) and 

fo^furthe^readingHsee^the'references Tthe” end of this chaV 

Sometimes .priori P ro1 |abihty d s “*" bu ^“ d B 1 ^ .“Tailed Bayes* 

^oeTures 8 (that^statistical inference when the parameters are 
assumed to have prior distributions). 

Exercises 

12.6. Obtain the regression of Y on X from the following distri 
butions : — 

(1) f(x, y)= r c.x(y+2) for 0<»<1, 0<y<l 

L 0 elsewhere and c is a constant. 

(2) f(x, y)=* r ax+2xy for 0<a?<l, 0<y<l 

^ 0 elsewhere and a is a constant. 

12.7. Obtain the regressions of Y on X and X on Y, given that 
f(x | y) = r h y e~ xy for #>0, g{y)=* C i> 0<i/<2 

\ 0 elsewhere t 0 elsewhere 

where Tc is a constant. 

12 8. If there is linear regression of X x on X 2 and X 3 , (that is, 
E(X, | X„ X 3 )=a+6 x 2 +c x 3 ) (1) evaluate the regression coefficients, (2) 
write down the regression coefficients in terms of the various correlation 
coefficients and variances. 


12-9. Assuming that there is linear regression of the marks obtained 
by the students, on the time spent, estimate the regression equation by the 
method of least squares from the following data. 


Marks obtained (x) 








88 

90 

95 

Time spent ( y) 

2 

25 

3 

3.5 

4 

4.4 

4.6 

4.8 

5 

5.1 


Also obtain the sample correlation coefficient. 

12.10. Assuming that there is linear regression of the marks obtained 
by the students on the time spent and the I.Q’s of the students, estimate 

the regression equation by the method of least squares from the following 
data. 


Marks obtained (a;) 

65 

2 

60 

3 

70 

75 

85 

82 

86 

90 

Time spent ( y ) 

3.1 

3.2 

3 

2.5 

2.2 

1.8 

I.Q’s (z) 

100 

98 

101 

102 

105 

103 

i 10 

1:0 


A 


\ 
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12.11- If_the relationship between * • 

- b (y-y) + e > estimate a and 6 from ♦Sri'* 8 ® S8Umed to be of the 
IpJjer® e is ft random error and 3/~is *e 

oft V' 

12 12. If x \y & 2 f»x n is an observed random « i x* 
obtain the least square estimate of <j* by consfrn^- pl ® from a N(h.= 2 , a) 
°° a 2 ' °y constructing an appropriate model 


[Hint. (*i— K-) 2 =<J 2 + ei-] 


W.onwithparame^^, £timate6by mtS'nV'Z dVer^n “‘if 

[Hint. EX f =e for t=l, 2, .. n ; X i = 0+ e< .] 

. Stithml'JV 8 ^ ob3erved random sample from an exponential 
population with parameter 0, estimate 0 by minimizing the maximum dis¬ 
persion [in the sense by minimizing D 2 , (see Section 12.1)]. 

Assuming that ej in a regression model y^aXi+b+e;, are 

independently distributed as a N(0, a) for i=l, 2.n, obtain the maximum 

likelihood estimates of a and b. Show that they coincide with the least square 

estimates. ^ 


[Hint, e N(0, a). 


^ = VWi e 


{y%—ax % — b) 2 

2 ° 2 =f(Vi | *♦) 


n 


n 


L= II f{e i )= -, — w . , 

*=i (\/2n) n a n * =1 


1_ e — S (y i -ax i -6)*/2( 


12.16. Show that the maximum likelihood estimates of o 2 in problem 

12.14 is 


3*= S [Vi— aa; 1 -6]*/«’ 

tf-1 

where (/\) denotes an estimated value. Express 3* in terms of the sample 
correlation coefficient r. 


12.17. Show that a and b are unbiased estimates for o and 6 in 
problem 12.14. 

[Hint. S(af i -^)(y < -^)=»[2;(a; 1 _^)(o(a!i-*)+e i -e./»]=a.Z(a: i -a ;) 3 
+ 2(z f ~%)(e t -e./n] ; E[2{x i -%){y i -y)<=a.EZlx i —%) t =a.'L{x i —X) i 
flU ice f* 16 *’s are constants]. 

12.18. Show that the variance of a in problem 12-14 is 

Var (3)=o 2 /I?_j (Xi— 3;)*. 

V at(e , [Hint E(g f )=0 for all i implies that EfY^aaJf+b for alii; 
ij-® 2 implies that Var (Y*)^ 2 for all i ; cc’s are constants.] 
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10 10 Bv using the results in problem 12.15 show that the null 
. xujSaH • a = 0 (no linear regression) is equivalent to H 0 . f-0 where p 
SScVrelatiin between Y end X. 


12.6. EXPERIMENTAL DESIGNS 

Here we will consider another problem where we use the 
principle of minimum dispersion, especially the principle of least 
squares, for estimating the parameters and the minimum value 
of dispersion, especially the least square mmimum, is used f or 
testing some hypotheses. When an experimenter finds it diflicult to 
obtain the inference he had expected from his data, he often 
consults a statistician regarding this matter. In such cases usual- 
ly it is seen that no valid conclusion can be drawn from the data, 
because the data was not collected properly or because the 
experiment was not conducted, keeping in mind the method of 
analysis of the experimental data. For example, suppose that 
an experimenter wants to compare two methods of teaching. 
Suppose that he conducts an experiment of teaching two classes 
of students according to the two different methods. The effect 
of a particular method of teaching is a hypothetical quantity. 
It cannot be measured directly. So he may take the average 
marks obtained by the students in the two classes. The average 
marks is not only a measure of the effect of a particular method 
of teaching but is also influenced by the different components of 
variation like, the intelligence of the students in the class, their 
previous aquaintance with the method of teaching, the particular 
teacher involved in the teaching etc. In this simple experiment 
there are so many extraneous factors which contribute to the 
average marks of the students. Hence a difference in the average 
marks cannot be taken as a measure of the difference in the 
effects of the two types of teaching. If an experimenter wants 
to compare two characteristics, his experiment should be designed 
in such a way that the data collected should in some sense mea¬ 
sure only the characteristic (treatment) under study. A properly 
.planned experiment controls the variation due to all other factors 
except the factors (treatments) under study. The reader might 
have noticed the necessity of properly designing an experiment 
in order to draw valid conclusions from the data. Here we will 
not consider the problem of how to design an experiment so that 
a particular analysis can be carried out. We will consider the 
analysis of the data, collected from a properly designed experi¬ 
ment. 


12 . 61 . One-way classification. Suppose that we want to 
compare the effects of 3 different types of fertilizers on the jdeld 
of corn. Suppose that 5 test plots which are homogeneous in all 
the extraneous variations such as fertility of the soil, climatic 
conditions etc., are planted with a variety of corn and fertilizer 
number one (say Mi) is applied in equal quantities. Another 5 
plots which are also homogeneous in every respect are planted 
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.. h the game variety of corn and fertile 
^nlied in the same quantity as Mi. Simflo , number two (M 2 ) i a 
ffi are planted with the same S y a f nother 8at * «* 

Applied. Suppose that the yields of corn of tW C °T and M » 
'town below. Corn of these plots are as 


m 2 

8 

9 

10 


The average yield of corn under the three fertilizers M b M 2 , 
Ms are 10, 8.4 and 9.6 respectively. We would like to know 
whether the observed differences in the average yields can possibly 
be attributed to chance, alone, or if instead the three fertilizers 
cannot be considered to be equally effective, ^^e would like to 
test the hypothesis that there is no difference among the effects 
of the three fertilizers in the yield of this variety of corn. If the 
hypothesis i3 rejected, (that is, if the differences in the average 
yields cannot be taken as chance variation) we would like to test 
further whether M a is more effective than M 2 , M! is more effective 
than M 3 etc., and also to estimate the difference in the effects of the 
fertilizers in terms of the observable quantity—the yield of corn. 
These are some of the problems that interest an experimenter in 
such an experiment. This problem reduceR to the problem of test¬ 
ing the equality of means in three populations. So we will 
consider a general model for the analysis. In our example we have 
three sets of data, namely, 5 observations in each set. 


Let = /*<+«< j (12.38) 

where x tj - denotes the j th observation in the i th set, ii { is the effect 
^fertilizer number i and e { j is a random error or an error due to 
chance variation in the j th observation of the i th set. In our 
example there are three fertilizers and 5 observations in each set 
fnd hence i= 1, 2 , 3 and j = 1, 2 , 3 ...6. e*/ s may be assumed to be 

Independently and identically distributed. If the corn was planted 
those test plots without the fertilizers Mi, M 2 and M 3 there 
°nld be some yields. So we shall take a more general model for 

0ur experiment 


**,-#*+«< (12 - 39) 

re ^ denotes a general effect and a f denotes the deviation from 
general effect due to fertilizer number i. 

+ +«<;• 


( 12 . 40 ) 
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tfVW 

_ li+tr wp may assume that E a* = 0. In 

,,'z *■<■ • ” “' h 

“• rr ‘m 

j=2 _ in 

and e«’« are independently and identically 

nd if they- obeer^on - ^ P 

form a:*j=p+«i+ e « tnei T u n 9 il 1 for the analysis of the 

SS""t£ SS.“n«T. 32 1 .»■«, a-j-i. «**• 

JU.l design model end the eoigeepending enpestmentel^nU is 

said to belong to a one-way classification. V effects 

( 12 . 41 ) the hypothesis that there is no difference in the effects, 

that is, 

H„ : /x 1 =/lx 2 =... = /x & which is the same as 

H 0 : <x 1 =0=a a = .••=«»! since 2 «<=0 is assumed. 

Let us consider the estimation of the effects and testing of 
various hypothest s in a one-way classification. For convenience 
we will apply the principle of least squares in order to estimate 
the various effects. 


e fj =x { ,— n—<x. < 

4 =( x a — /*— a *) 2 

E e 2 = Z (a^-p-a») 2 = L (say) 


where 


k n 

E —EE 
i,j i=U= 1 


(12.42) 

(12.43) 

(12.44) 


(12.45) 


Minimize L with respect to the parameters p, ai, c/.%, ...a*. 

It is easier to take p^p-f-oq and minimize L with respect to 
Pi, p 2 ,...p fc . The normal equations are 


dL =o — -0 — =0 

3^1 ’ 0P2 ” 0pfc 


=0 =► -2 2 K ~ P*)=0 

0/*i 


(12.46) 


(12.47) 


(L— (arii—Pi) 2 -|-(a;i2- / x i) 2 + --- +(^in—/ x i) 2 4 { x n —/ x 2) 2- ! - 

+ (z*n — Pic) 2 and differentiate L partially with 

respect to p,) 


analysis of dispersion 


359 


^ ^ { x ij — f x i) = 0 

;=l 

n n 

=* (*«)- 27 ^ = 0 

J =1 i=i 

=> Xi.-n^^O 

7i ■ 

P<=- 

91 


(12.48) 


where ( A) denotes an estimated value. We will use the notation 
k n 1c n 

S x ij =x 'j> 27 Xij=x.i. , E xa = 27 E Xjj—x, etc., 

t=l i =1 i, j i =\ j = 1 

that is, a summation with respect to a suffix is denoted by a dot. 
But /x < = ^ t -|_ af 


a P — 5. g = ^ p —£„ = (12.49) 

w ^ n n v 

t.e., the difference in and the treaement effects is estimated 

by Xv '~ Xq ' 

n n 

The least square minimum S 2 or the minimum value of L is 
obtained by substituting A* for fi it since can be shown to 

minimize L. 

S 2 =S(*i,-A,) 2 ( 12 - 5 °) 

V 


IC 

’(Xu~ — ) =27 a£. _ 27- (12.51) 


= 27( x {j ~ — =2x^. - 27- 

w / tj 11 i » 

Let the data or the observations be as shown below : 


Sum 


Set 1 

Set 2 

Set 3 

X 11 

X 21 

X 3 1 

X l2 

X 22 

x 32 

X 1Z 

• 

• 

• 

X 23 

m 

• 

# 

X 33 

• 

• 

m 

x l n 

• 

• 

• 

x i n 

9 

9 

• 

x an 

x 1% 

Xy 

Xy 


Set k 


x kn 

\ 

x k . 
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Xi. 

n 


the arithmetic mean in the i th set 


si- 

n 


= arithmetic mean in the first set. 


x 2 . 

— = arithmetic mean in the second set etc. 
n 

k n , _ 

Hence S 2 = Z S ( x u — — 

i -1 j_l\ «■ 

may he called a measure of the within set variation. 

Consider the hypothesis. 

Ho • ^l = /i2 == . f^Tc 

or ai = 0=a 2 =.= a k (12 52) 

i.e., our hypothesis is that there is no difference among the 
treatment effects or the differences among the observed values may 
be attributed to chance variation. Under this hypothesis the 
model is 



x a — P + 0 -f ea for i — 1, 2, ...Jc 

j= 1, 2, ...n 


(12.53) 


The least square estimate of /a under H 0 is obtained by 
minimizing 


L 0 = S (a?i, -FJ 2 

v 

The normal equation is--=0 

0/A 

=> 2 Xn—2 / a = 0 
ij ij 


^ X • • 
=>•/* = —. 
nk 


(12.54) 

(12.55) 


(12.56) 


The least square minimum S 2 under H 0 is 


»)' 


(12.57) 


Since —is the arithmetic mean of all the 
nk 

KJ ^ is the variance in all the observations, 


observations and 
S 2 may be called a 
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measure of the total variation in the data. Hence Sj _S 2 may 

t>e called a measure of the variation due to +1 r ^ 

The minimum variation under the genera? , h ? po ^ e81s J **„• 
minimum variation under the restricted moderJestTictedTy H" 

is s: . Therefore Sj -8* can be attributed to the variation d„“e 
to Hq% 


W ** nk) 


0*6 

=2Z±. 
i n 


n S ( ^ 


x.. v 

nk ) 


(12.58) 


CC ■ 3/ 

But — is the mean of the i th set and -Z- is the grand 
n nk 


mean. Hence 


S n 2 — S 2 =ro Z[ X J-~ 
0 i \ n nk 


may be called a measure of between set variation 


-.?( 


zu- 


x ' \-S 

x 2 — 

X*.. 

=total variation. 

Tile ) 

ii 

nk 


(12.59) 

II 

e 

X 2 — 
V 

x 2 .. 
nk 

)-(?-■ 

x 2 .. \ 
nk ) 

= within set variation. 

(12.60) 


X- 

_ X 2 i. 

a; 2 .. 


J y n nk 


• n 

nk 



„ (Xi. X.. Y X‘ t . x*.. 

S o ~ S 2 ==n ?V^ nlc / i n nk 

= between set variation. (12.61) 

But S® =S 2 +/ S* — S 2 j . (12.62) 

Total variation=within set variation+between set variation. 
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The technique of splitting up the total variation i n the 
data, into the variations due to the different components of f varia. 
tion, is called the analysis of variance technique. In our example 
we have taken account of the part of the total variation which 
can possibly be attributed to the variation due to the hypothesis 
H 0 : ai = 0 = «2 =... = ajfc, t?t 2 , that the different treatment effects 
are zero. We can test H 0 by testing the significance of the varia¬ 
tion due to the null hypothesis fthat is, S 2 — S 2 ^ . 


Our assumption in the model is that e</s are independently 
and identically distributed with mem zero and variance c 2 . It 
may be easily shown that 


( 




and S z /(nk—k)=S 2 /Jc(n— I) 


are unbiased estimates of a 2 . (The proof of this part is left to 
the reader). Hence if we assume that e i3 ’s are independently 
normally distributed with mean zero and variance a 2 (this is 
equivalent to the statement that X { /s are independently normally 
distributed with mean fi j-<x { and variance a 2 for i=l, 2, ... k and 
j= 1, 2, ... n) then 


( 




(12.63) 


*--Lz[ x _ X — V' X 2 

o 2 “ <J 2 fj V n ) ’ Xfc <"- 3 > 


(12.64) 


But these two x 2 ’s can be shown to be independent. (The 
proof of this step is beyond the scope of this book). 

(s ;-s«Y(*-1) 

Hence S*/jfe(»—1) = F /c-l, &(n-l) (12.65) 

This variance ratio maybe tested for signific¬ 

ance. If this F with Jc~l and&(w-l) degrees of freedom is not 
significant we can accept H 0 : a 1 = 0 = ... = a Jfc or the treatment 
effects are all zero. If F^._j is significant, not all the 

treatment effects are eoual to zero. Some may be zero and some 
may not be zero. In this case we shall investigate further and 
see which are zero and which are not zero. This aspect will not 
be discussed in this book. 

For convenience and simplicity the analysis may be given 
m a tabular form called the analysis of variance table. 
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Variation due to 

V col.(l) 

Degrees of 
freedom ( d.f.) 
col. (2)1 

Sum of 
squares 
( S.S.) 
col. (3) 

Mean squares 
( M.S.) 

col. (3)/col. (2) 

F-ratio 

Infer¬ 

ence 

Between sets (bet¬ 
ween samples) 

Within sets (error 
or residual) 

k-1 

k(n—l) 

( - S! ) 
s* 

( SS -S^/^-l) 

=B 

S 2 /Jfc(n-1)=E 

B/E 


Total 

kn — 1 

0 





To facilitate computation the following computational proce¬ 
dure may be given : — 


1. Compute H x and x.. 

ij 

2. Compute the correction factor (C.Y.)=x z jnk 

k x 2 

3. Compute S — 

*=l n 

4. Compute S a = £ x*. — C.F. 

ij 

k 2 

5. Compute S a -S 2 =£ ^t-^-C.F. 

0 1 n 

6. Obtain S 2 by subtraction £s a =S a — ( Sj — 

After obtaining these the rest of the quantities in the ana¬ 
lysis of variance table can be calculated easily. 

Ex. 12.61.1. The marks obtained by 10 students according to 
3 different methods of teaching are given in the following table. 
Assuming that the experiment is planned in such a way that a one¬ 
way classification model may be set up for the data, test whether the 
three methods are equally effective, at 5% level. 













Total 

Method A 

50 

60 

60 

65 

70 

80 

75 

80 

85 

75 

700 

Method B 

60 

60 

65 

70 

75 

80 

70 

75 

85 

80 

720 

Method 0 

40 

50 

50 

60 

60 

60 

65 

75 

70 

70 

600 
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Sol. Let x it be the j th observation in the i th set 

i= 1, 2, 3 and 

j=\, 2, ... 10. 

According to our notation n= 10 and Jc= 3. 
x.. = Z # f 3 = 700+720+600=2020 

C.F. =» 2 fnlc= 136013.33 

71 %. 2 J 

z = H 7 [ 70 ° 2 +72° 2 + eoo 2 ]=] 368 40 

i = 1 

1 —C.F. = 826.67 

*=1 » 

\ 

Z x% — C.F. = 50 2 -(-60 2 -f 60 2 -f ... + 70 2 —C.F. =3636.67 

v 


The analysis of variance table. 


Variation due to 

d.f. 

S-S. 

M. S. 

F-ratio 

Inference 

Between methods 

error 

w Si 

II 

r-H 

1 

826.67 

2810.00 

413.335 

1405.000 

413.335 

I4u5. 

not 

significant 

Total 

nk— 1-29 

3636.67 





Error sum of squares = (3636.67)-(826.67) = 2310.00 and the 
corresponding degrees of freedom=29—2 = 27. 


The tabled value of F 2 , 27 is in between 3.32 and 3.37 at the 
5% level. The observed F 2j27 = 413.335/1406<1 and hence the 
hypothesis may be accepted or it may be assumed that the three 
methods of teaching are equally effective. 

Comments. In this example and in the theory discussed 
we assumed that the number of observations in each set is the 
same for all the sets. When the different sets have number of 
obervations n lt n 2l ... n lc where all the n^s are not equal the theory 

may be developed in a similar fashion. Some exercises are given 
at the end of the chapter. 

10 12.62. Two-way Glassification. In our example of Section 

4 -.61 we had three different fertilizers and one variety of corn 
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* mber r of 0 e n X perim e ent n s t cones 68 ° 0rn We can either «°nd«ct a 

rbina^Trntet 0 : 1 '**“!“' 

°T1 \ b r d “ au L P tct e ot“ P e e epT:b 

treatments, fertilizers and varieties and if denotes the observa! 
tl an corresponding to the »•* fertilizer and ,*** variety we can 

a f SU v/ff ? the result of a general effect, the effect'due to 

the % fertilizer (say oq), the effect due to the j th variety (say B,) 
and a chance variation (say e {j ) J K J ^ ' 


a f , p 3 ., e*,-) (12.66) 

where <f> is a function of (general effect), a p, and e ti . 

... If ^e assume that ^=ft + a,- + p J + e<J (that is, the contribution 
of these different components is additive) we may write 

*</=/* + a<+ &+««. (12.67) 

In our example i = l, 2, 3 and j =1, 2, 3, 4 (since there are 3 
fertilizers and 4 varieties). In general we may have r fertilizers 
and t varieties i=\,2,...r and j=l,2,...t. In an experiment desig¬ 
ned to study the effects of two types of treatments (for example 
fertilizers and varieties as described above) say A lf A 2 ,... A and 
Bi, an observation corresponding to the i th treatment of 

one type (A*) and the j th treatment of the other type (B 5 ) may be 
denoted by x if . If we have more than one observation correspond¬ 
ing to the ( ij ) combination of treatments we can denote the k th 
observation corresponding to the (ij) treatment combination by 

X i3k' 

In the following discussion we shall consider the analysis 
when there is only a single observation corresponding to a treat¬ 
ment combination. A model of the form (12.68) is called a simple 
two-way classification model. If an experiment is designed in 
such a way that it satisfies all the conditions in the model (12 68) 
then such experimental data is said to belong to a simple two-wav 
classification. 


Model 


j=l,2,...t 


( 12 . 68 ) 


*• t 

2 af=0, 2 (3^=0 and e t /s are independently and iden- 

3=1 

Really distributed with zero mean and variance equal to a 2 . 
Suo^ her « x , a 2 ,... a r pj, p 2 .— are assumed to be constants. 
11 a model is called a simple, additive, fixed effect, two-way 
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classification model without interaction. The concept of interac¬ 
tion will be discussed later. The conditions 2 a f =0=2 I are 
justified since we can assume that <x. { is the deviation from the 
general effect due to the factor and (3 j is the deviation from the 
general effect due to the factor (or treatment) By. 

12.62.1. Estimation of the parameters. The parameters 
a f , (3y, for i—\, 2 ,...r and j = l, 2... t may be easily estimated by 
the principle of least squares 


— X{j [A <X t - (3y 


(12.69) 


S S e 8 . = S (x if - — ft — ay — (3 3 ) 2 =L (say). (12.70) 

i=i j =i 11 a v 

Minimize L with respect to ji, a if ... a r , 

For convenience let ju, i =/z-J-a i . 


and 


The normal equations are - =0, for * = 1, 2,... r 

3 Pi 

-^-=0fori = l,2,...i. (12.71) 


—— = 0, => —2. 2/ — Pj)—0. 

Of*i j = l 


2 (»«—#*.—Pi) = 0 

i-i 


(12.72) 


=>#*.—t fii—0. 


(since 2 (3y=0) 
i=l 


M-< = 


(12.73) 


a P —c/. a = ^p—lj' Q = 


t t 


(12.74) 


^ = 0 , =* 2 (x tj -fii -^)=0 

aPi 4 -=i 


Pi= - r 

r rt 


(12.75) 


^ v Pa — 


(12.76) 
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The least square minimum 


\ij t r r rt J 

(since /*<=& and p 3 =p 3 -can be shown to minimize L) 

is-‘-In 


S 2 ~ 


(12.77) 

The simplification is left to the reader. The fact that 

^ ^ A 

due to the normal equations, may be 

ij 

used to advantage) 

Let us consider the hypothesis that there is no difference in 
the effects of A 1} A 2 ....A r , that is. 

Ho I <*i — a 2 =... = a..=0 


1*1 — 1^2 — ••• — pr — 

(12.78) 

Under H 0 , the model becomes 


a? ij=/ x 4'Pj+e{j. 

(12.79) 

The least square estimates are 


* % j X j X 

[i— " and S 3 -— — 

rt e rt 

(12.80) 


and the least square minimum SJ under H 0 is 
r v ~ 2 x 2 ..\ f y x 2 .j 

SJ =( f, - w-H > 

The sum of squares due to the hypothesis H 0 is 


(12.81) 


s; —s 2 =* rtt—— 

0 i t rt 


x 2.. 


This may be called the sum of squares due to the A factors 
or the treatments Ai, A 2 ,... A r . Similarly the sum of squares due 
to be B factors may be obtained as 


<Y 2 /y* 2 

. j 


(12.81) 



introduction to statistical mathematics 

368 

, ■ Of variance table for testing these two hypo- 

The analysis of variance 

theses may be set up as follows. 


Yarictiofi 
due to 

d.f. 

s. s. 

M.S. 

F—ratio 

Infe* 

rence 

A—treatments! 

r—1 

1 

i * H 

T-Ti/lr-l) 

T/E 


B—treatments 

t— i 

_ *^ = t 2 

3 r r% 

U=T 2 /(«-l) 

U/E 


Error 

rt—r — 2+1 
= (r-l)(^l) 

f(subtraction) = E j 

1 tel 
- II 

1 ^ 
t—' 



Total 

rt- 1 

o rn 

f. *« rt 3 

V 





If we assume that e t /a are independently normally distri¬ 
buted with parameters ^=0 and «r, then by an argument similar 
to the one used in a one-way classification analysis, we can show 
that T/Eis an F with r - 1 and (r—1) (t-1) degrees of freedom, 
and U/E is an F with t-1 and (r—1) (i-1) degrees of freedom. 
Hence the hyporheses ai = a 2 = ...=*r = 0 and pi— P2 — •••- rt — 
may be tested by T/E and U/E respectively. 


table : 


A two-way classification of data is given in the following 



Bx 

b 2 

. Bf 

Total 

A, 

*u 

#12 

*it 


A 2 

• 

• 

*« 

#22 

X 2t 

*2- 

• 





A r 


*r2 

*t 

x r . 

Total 

1 


•C. 2 

*-t 

X .. 
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Computational procedure : 

1 Compute H x*. and x.. 

ij 

2 

X 

• • 

2 , Compute the correction factor (C.F.) = 

2 

x * 

3 Compute S -— C.F. 

i t 

a 

4 Compute E -—C.F. 

j r 

5. d.f. for error = (rf— 1)—(r—1) — (£ — l)=(r—l)(f—1) 

6. S.S. for error=T 3 —Ti—T 2 

Ex. 12.62.1. The observations from an experiment which is 
designed to use the model (12.68) and compare the effects of 3 types 
of fertilizers on the yields of 4 varieties of corn, are given in the 
following table. Test the hypotheses ( 1) there is no difference among 
the fertilizers (2) There is no difference among the varieties, as far as 
the yields are concerned. 


Varieties 

Fertilizers 

B% B$ B& 

Total 

A i 

16 20 22 20 

77 

Az 

20 30 32 28 

110 

As 

20 35 38 32 

125 

Total 

‘ I 

55 85 92 80 

312 


Sol. C.F.=(312) 2 /12=8112 

E ^ —C.F. = (77 2 4-110 2 + 125 2 )/4— 8112 


=301-5 
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2 C.F.=(55 2 +85 2 +92 2 + 80 2 )/3 8112 


s 4 

*3 


-C.F. = (15 2 +20 2 +- + 322 )" 8112 
= 598 


The residual s.s. =598 —(279.3+301.5) 


Here r=3 and 
r— 1=2, t— 1=3, 
variance table is. 


=17.2 

t= 4. The various degrees 

(r-1) (f-l)=6, r$-l=H. 


of freedom are 
The analysis of 


Variation 
due to 

d.f. S.S. M.S. F-ratio 

Inference 

Between 

fertilizers 

160.75 _ 0 a 

2 301.5 150.75 =62.9 

Significant 

• 

Between 

varieties 

93.10 00 . 

3 279.3 93.10 =32.4 

Significant 

Residual 

e 17.2 2.87 

' 

Total 

11 598.0 



The tabled values of F at the 1% level are 
Fj, 6=10.9 and F 3 , 6=9.78. 

The observed F-values are greater than the tabled values 
and therefore we reject the hypothesis. We can not assume that 
the varieties are the same or that the fertilizers are the same, as 
far as the yields are concerned. 

Comments. Individual hypothesis such as whether any 
two particular varieties have equal effects or any two fertilizers 
have equal effects, etc., may be tested by using a student t test. 
This will not be discussed in this book. 

. ,. ^ two-way classification is different from a contingency 
a e. In a contingency table we have frequencies whereas in a 
’wo way classification table we have observations. 
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rn the example discussed above 
bl em of taking single observation correspond?^ t ° onsidered th ® 
^ combination. For example corresponds toVh 6 ? ry ‘ reat ' 
£fand Bi we take one observation etc. Instead of takS “nale 

S« Th ; » 1 “™ Uo b ;r i W 2 * S TSt S.J 
s, n ” l r i° b T*“°“ in 

e acn ceu A® n mere are n u observations corres- 

ponding to At and B, for * = 1 2,...r and j=l, 2....J (that is, » u 
observations in (1, 1) cell, n 12 observations in the (1, 2)‘ h cell etc.l 
then the normal equations and the analysis become compli¬ 
cated. 


In the model (12.68) we have taken only the variation due 
to Ai and Bj. There is a possibility of a joint variation in the 
sense that there may be a contribution due to the particular 
oombination (A$, Bj) or the effect of A t * may be varying with 
respect to the B factors. In such cases the joint variation is called 
interaction and the model (12.68) may be modified as. 


a; ii = P'+ai + Pj4-Y<j + e,# (12.84) 

where Y,-* denotes the interaction between A, and B*. Study of 
interaction is not possible in the model (12.84) unless we have 
more than one observation corresponding to A t - and B* or we put 
more restrictions on the model (12*84). If we have a number of 
observations corresponding to A< and B, we can write the k th 
observation in the ( ij) th cell as 

x ijk~ + a <+?;+■)(< (12.85) 

and the analysis may be carried out. These ideas and definitions 
may be easily generalized for a j?-way classification. 

Sometimes we may have to take more than one set of obser¬ 
vations corresponding to a treatment in order to have a valid 
inference. For example consider an experiment in which k 
different diets are tried on K sets of a particular variety of cattle. 
If all types of extraneous variations are controlled a model of the 
form 


js m order, where x {i denotes the increase in weight. But it is 
difficult to get animals of the same initial weight and the initial 
Rights may have some effect on the increase in weight. The 
weights (say y {j ) are observable and hence we may use a 
Edified model 

i+byu+ea ( 12 . 86 ) 

Co ere a cons tant. This additional variable is often called a 
e d |L >ni ^ an t variable. For the simplicity of the analysis we assum- 
li n the concomitant variable is related to xn in the form of a 
HiodJI ? e 8 res sion as given in the model (12.86). The analysis of a 
involving one or more concomitant variables is called the 
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analysis of covariance. If there is a concomitant variable present 
in a two-way classification model without interaction, we may u Se 
the modified model 


xa - /* ■++(V+ cyv+e {/ (12.87) 

where c is a constant. 


12.63. Randomized Block Experiments. This is a special 
experimental design where we can use a two-way classification 
model without intersection for the analysis of the experimental 
data. Consider an agricultural experiment, designed to study the 
effect of t types of fertilizers on the yield of wheat. Take a sec¬ 
tion of land called a block which is homogeneous in all the extra- 
neous variations such as fertility of the soil, climatic conditions 
etc. Divide into t plots of the same shape and dimensions. Apply 
the t fertilizers at random to these t plots in older to avoid 
possible variations between plots due to uncontrolled factors. 
Plant the wheat of the particular 
variety under consideration, We 
shall replicate or repeat the ex¬ 
periment by taking another block 
of land which is homogeneous within, dividing it into t plots and 
applying the fertilizers at random and planting the same type of 
wheat. These blocks are homogeneous within but there may be 
between block variations. For example one block may be in one 
province and the other may be in another province. If we have 
r such blocks and all the t treatments tried in all the blocks, we 
have a randomized block experiment. A convenient model for a 
randomized block experiment is 

+ eu ' ( 12 . 88 ) 


F 


F. 




a,-—block effects and fa —treatment effects. 

If we have a large number of treatments it may be difficult 
to get a homogeneous block so that we can try all the treatments. 
In such cases other designs known as incomplete block designs 
are used. Some of the commonly used incomplete block designs 
are : (1) balanced incomplete block design, (2) partially balanced 
incomplete block designs, (3) lattice designs etc. 

12.64. Latin Square Designs. A latin square of order n is 
an arrangement of n elements into n rows and n columns such 
that every element appears in each row and column once and 
only once. A latin square of order 3 is given below. The ele¬ 
ments are the latin letters A r B and C. 


Such a design can be conveniently 
used to test hypotheses regarding three 
sets of treatments A 1# A 2 ,...A„; B 2 , B 2 , 
...B n and Cj, C 2 , 

Take one set of treatments corres¬ 
ponding to the rows, another corres¬ 
ponding to the columns and the third 
corresponding to the latin letters. The 


A 

B 

0 

B 

C 

A 

C 

A 

1 

B 
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various hypotheses may be tested bv tahi™ r Q c n , 

lumber of cells in the latin square) obBervat^nJ a U aS t 

£ e d effect non-interaction model is ° bSerVatl0ns - A convenient 

a;<J ' (fc )“^"J" a <+Pi+Yfc+eii ( fc ) (12.89) 

a< s,p y s and Yfc s are the effects of the three sets of treat¬ 
ments. For convenience we will call cq’s the row effects, S/s the 

column effects and y*. s the treatment effects. 
n n n 

S a i=0= 2 ' 2 E[e o(fc) ]=0 and Var [c*i( fc )]==cr 2 

are assumed, denotes the observation corresponding to the 

treatment if it appears in the ( (i th row column) ; cell. 
If we assume further that are independently distributed as 
a N(0, ct) then the analysis of variance table for a latin square 
experiment may be given as follows. (Here the normality assump¬ 
tion is used only to test the various hypotheses by using F-tests). 


Variation 
due to 


M.S. F-ratio Inference 


Rows 


Columns 


n—1 

2 

• 

t 

2 

x u. 

ti 

x 2 
• • • 

’ n 2 

i • 

% 

(col. 3)/(col.2) 
= R 

t 

R/E 




X* 

. t t 

• 

pH 

1 

2 


m • . 

C . 

C/E 



n 

n 2 

• r 



Treatments n- 1 ^ X " lc * ***' 

k k ' 


Residual (n — l)(n— 2) (by subtraction) 


Total 


(»*- 1 ) 




T./E 



The degrees of freedom for the residual 

= (»*-l) — (»—l)-(»-!)“(»—!)• 

The residual sum of squares=Total S.S.—Row S.S. 


— Col. S.S.—Treat S.S 
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The various F-ratios in the 5 <ft column have n —1 and 
2) degrees of freedom. Other designs related to a latin 
square design are greeco latin squares, youden square etc. 

There are a number of other designs which are commonly 
used, such as factorial designs, split-pilot designs, nested designs 
etc. So far we have been considering only fixed effect models. 
That is, for example, if we have a one-way classification model, 

*#=/*+«*+e<# 

we assumed that, <jc/s are constants. If the treatments used are 
only a random sample from a finite or an infinite set of treat¬ 
ments, a/s may be assumed to be the values assumed by a sto¬ 
chastic variable. A model, in which the various effects are assum¬ 
ed to be stochastic variables, is called a random effect or a vari¬ 
able effect model. In a model if some effects are assumed to be 
constants and some as variables such a model is called a mixed 
effect model. For a detailed discussion of the different designs 
and models see the bibliography at the end of this chapter. 

Exercises 

12.19. The following table gives the I.Q’s of 4 groups of students. 
Assuming that these observations may be considered to be random samples 
from independent populations N((i. < , o) for *=1,2, 3, 4, test the hypothesis 
that (i. 1 =|i i =n. s =s|*. 4 at the 6% level by using an analysis of variance 
technique. 

Group 1. 100, 102, 105, 101, 98, 115, 112, 110, 114, 108. 

Group 2. 98, 100, 100, 102, 95, 110, 108, 112, 111, 104. 

Group 3. 105, 102, 103, 104, 99, 100, 107, 106, 107, 102. 

Group 4. 99, 105, 110, 112, 98, 102, 98, 100, 100, 104. 

12.20. The results of a completely randomized experiment (where a 
one-way classification model is appropriate) conducted to study the yields 
of 3 varieties of wheat are given below. Test at the A% level whether all the 
varieties can be considered to be the same as far as the yields are 
concerned. 


Variety 1. 

2 . 

3. 

12.21. In a one-way classification with the model 
!C <; = H'+a<+c t -,- *= 1, 2, ...ft \j= 1, 2,...n 
E( e <;)=0 and Var (e i? )==c* for all * and j 


Yields 


10 

12 

11 

13 

15 

14 

15 

17 

1 

20 

16 

12 

14 

15 

13 

11 
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^ siS 

ghow that 



under the hypothesis H 0 : a 1 =a,=...=a 7c ==0. 

12.22. In a one-way classification, if the number of observations in 
the set is for i=l, 2show that the sum of squares due to, between 

Bet variation is 



% 



— where x x *., x { , x .. and n. 

Ti* * 


denote the j th observation in the set, S x (t=Xi ., 2 x^x.. and 
respectively. 3 *3 % 


[Hint. Use the model x i} = for f=l, 2,...fc and j=l, 2; 

Obtain the least square minima under the general model and under 
the restricted model, restricted by the hypothesis H 0 : a 1 «=><x 2 =...=a fc .] 

12.23. Show that the degrees of freedom for the residual in the 
model of problem 12.14 is (n.-l)~[k-l)=(n.-k). 


12.24. For the observations in problem 12.14 show that 



12.25. Three processes A, B and C are used in a production process. 
Assuming that the design is such that we can use a one-way classification 
f^del, test at the 5% level whether the three processes can be considered 
? be equivalent as far as the outputs are concerned. The following observa- 
10118 0n the outputs are made 

Method A 10, 12, 13, 11, 10, 14, 15, 13 
Method B 9, 11, 10, 12, 13 
Method C 11 10, 15, 14, 12, 13 
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12 26 The yields of 3 varieties of corn according to two methods of 
nlantw are given in the following table. Assuming that a two-way class!. 

Latio^modef without intersection is appropriate for the analysis, test the 

various hypotheses at the 5% level. 



Variety B x 

Variety B 3 

Variety B t 

Method Aj 

10 

15 

22 

Method A, 

8 

16 

26 


12 27. The results of a randomized block experiment conducted to 
study the effects of three methods of spinning on thebieakmg strength of 
3 types of cotton, are given in the following table. The observations are 
the 'breaking strengths’. Test the hypothesis that the. three methods are 
equally effective, at the 5% level. 



Type 1 

Type 2 

Type 3 

Method 1 

22 

21 

20 

Method 2 

18 

20 

24 

Method 3 

19 

22 

24 


12.28. In a two-way classification model with interaction 

x iik— 2,...r ; 

j=\ t 2 ,...t; k—1, 2,...n ; obtain the sum of squares due to interaction. 

12.29. If 7t=l in problem 12.28 can the interaction sum of squares 
be estimated ? 

12.30. £ et up the (analysis of variance table fo>r the model in problem 

12.28. 

12.31. In a latin square design with the model 

x ij{k) = ^ a *+P;+Y*+ e ij(Jc) 

t=l, 2,...f i ; j= 1, 2,...w ; h=l, ; show that the residual sum of 

squares is 

o"2 


~ n ‘ 


V f w ** ) \ l n ' n* ) 
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Mre and denote the sum of the observations in 

^Sn and coirespondmg to the treatment respectively 


the i™ row, / 


12 


fre6d0m f0r th ° “ in problem 


12 33. A latin square design and the corresponding observations are 
„; v en below. Write down the analysis of variance table and test the 
various hypotheses at the 5% level. 


Design 


A 

B 

c ' 

B 

C 

A 

C 

A 

B 


Observations 


8 

7 

10 

10 

12 

14 

9 

14 

16 


12.7. A GENERAL TEST STATISTIC 

In the experimental design problems, in section 12.6 we used 
the variance-ratio tests for testing the various null hypotheses. 
The least square minima under the various hypotheses were 
obtained and under the assumption of normality for the error in 
the model, the comparison of the least square minima led to a 
variance- 1 atio test. In general the minimum dispersion may be 
used as a criterion for testing various hypotheses regarding a 
mathematical model set up for observed data. A comparison of 
the minimum value of a measure of dispersion will yield test 
statistics for testing various hypotheses. 

12.71. Kolmogorov-Smirnov Statistic. This is a conven¬ 
ient statistic for testing ‘goodness of fit’ and. was formulated by 
two Russian mathematicians and hence the statistic is named after 
them. In chapter 11 we discussed the use of a chi-square statistic 
£or testing‘goodness of fit’. Butin order to use the chi-square 
test the data had to be classified and further more the frequencies 
must be sufficiently large. This cotidition restricts the use of a 
chi-square statistic when the frequencies are small or when a 
classification is not desirable for a given data. In such a situa¬ 
tion an exact test is usually given by the Kolmogorov-Smirnov 

statistic D n . 

Consider a goodness of fit problem. Let x \, ..., x n be the 
observed data. We want to test whether this sample can be 
considered to be a random sample from a specified distribution 
0 O ). For example, suppose we have an observed sample of 
81ze 15 and that we would like to decide whether this sample has 
oome from an exponential population with the parameter 9—2. 

be the arrangement of x\, ..., x„ according to 
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the order of the magDitude of the x*a. Then the sample distribu. 
tion function is given by the formula, 


S«(*)= 


0 x < Ui 
rfn u r ^x <u r+1 
1 X>u n 


(12.99) 


S„(a;) is the cumulative frequency function. Let the hypothesis 
to be tested, be that the x’a came from a population with the 
density function f(x, 6 0 ) or with the distribution function F(z, 0 O ) 
where 0 O denotes that the parameters are fixed or that the distri¬ 
bution is completely specified. Therefore the hypothetical value 
for S M (a;) is F(a, 6 0 ). The error in this model is 

e=S„(a;)—F(a?, 6 0 ) ( 12 . 100 ) 

If we use the measure of dispersion D 5 (see section 12.1 equa¬ 
tion 12.7) then 


D(e) = sup | S B (a;)—F(s, 0 o ) | (12.101) 

X 

The minimum dispersion 

= min D(e) = sup | S n (a;) — F(a?, 0 o ) | 

0 X 

[since all the parameters are specified min D(e)=D(«)] (12.102) 

e 


D n = sup | S n (x)— F(z, 6 0 ) | (12.103) 

■ X 

is called the Kolmogorov Smirnov statistic. Instead of using the 
measure D 5 , if we use the measure of dispersion D 4 for r=2 (see 
section 12 . 1 , equation 12 . 6 ) we get 

W=min (E x | S„(*)-F(a, 0 O ) | 2 } ,/a 
0 


={Ex | S„(as)-F(*, 0 O ) | «}i/* (12.104) 

W is usually called W 2 -statistic for goodness of fit. Evidently D n 
and W 2 are stochastic variables. Their distributions can be 
worked out for any particular Bample size n. The critical values 
are tabulated for various values of sample size and hence the 
‘goodness of fit’ may be tested by using a D n or W 2 statistics. 


. -t .? x ‘ ^ se a D n statistic to test whether a normal dis¬ 

tribution with the parameters /*=.13 and o =2 is a good fit for the 
data given below : 9, 10, 10, 11, 12, 12, 13, 13, 13, 14, 14, 15,15, 16. 


Sol. 


n —14 ; fix, 6o)=~ 

V2iz 


_(s-13)» 
e 2 
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x t 

m «“ 1 f( ~ X ’ 6 ° )d *= j ^ e -'*/2 * whek 

— 00 — 00 1 

S.(*) = S M (*) 


The various values are given in the following table : 


ti 

fre¬ 

quencies 

a __ 

s ufa) 

X 

e 0 ) 

1 S u(*)-F(x, 0 # ) | 




%<y 

0.0000 

0.0000 

9 

1 

1/14 = 0.0714 

9<a;<10 

0.0013 

0.0701 

10 

2 

3/14=0.2142 

10<x<ll 

0-0228 

0.1914 

11 

1 

4/14=0.2856 

ll^a;<l^ 

0.1587 

0.1269 

12 

2 

6/14=0.4284 

12<k<13 

0.6000 

0.0716 

13 

3 

9/14=0,6426 

13<*<14 

0.8413 

0.1987 

14 

2 

,11/14*. 0-7854 

14<^a;<15 

' 0.9772 

0.1918 

15 

2 

13/14=0.9282 

! I5<a;<le 

5 0.9987 

0.0705 

16 

1 

14/14= 1.000C 

) a;>16 

0.9999 

0.0001 


^*(*1 «.) for various values of x are obtained from a normal 
probability table. For example, when 9<*<10, 

10 

F(*, 9 0 )= ( (2 it)-'/* e -(*-13)*/2 

- 00 

-3 

= f ( 2n) -1 / 2 e-<’/2 dl 
— 00 

= 0.0013. 

D 14 =max | Si 4 («) —F(», 0 O ) | =0 1987. 

The tabulated value of D 14 at the level a=0.05 is 0.35 
Approximately. This is obtained from a table of D„. (References 
F A?j Ven at the end of this chapter). The observed D„=0.1798 
^^•35 and hence a N(u=13, a=l) may be considered to be a 
g0od fi ‘ »t the 5% level. 

We . j In most of the tests mentioned in this chapter 

1( 1 not assume a basic distribution. Such tests are often 
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called distribution free tests. In D n and W 2 statistics we did not 
consider a problem of testing a null hypothesis against a para- 
metric alternative. These distribution-free tests are sometimes 
called non-parametric tests. Some other non-parametric tests 
commonly used, are sign test, rank test, run test etc. A brief 
account of the some of the non-parametric tests are given in the 
following section. For further reading refer to be bibliography 
given at the end of this chapter. 

Exercises 

12.34. Test the goodness of fit of a Poisson distribution with the 
parameter X=2 to the following frequency table, at the 5% level, by using a 
Kolmogorov-Smirnov statistic. 


X 

0 

1 

2 

3 

4 

6 

6 

7 

8 

Frequency 

i 

52 

63 

60 

40 

22 

10 

3 

1 

0 


The tabled value of !D 2M =0.Q85. 

12.35. 'Test the goodness of fit of a N(n, cr) to the following frequency 
table by using a Kolmogorov-Smirnov statistic 


X 

10 

12 

13 

14 

15 

lb 

17" 

Frequency 

2 

4 

n 

B 

B 

4 

2 


[Hint. Estimate the parameters. The test is not exact due to the 
estimation of the parameters. Use the 1% level. The tabled value at 1% level 
is D 36 =0.27.] 

12.8. SOME DISTRIBUTION-FREE PROCEDURES 

Solar we discussed statistical inference when the population 
under consideration was assumed to be normal in the case of 
continuous populations. Even though there area good number of 
theoretical results which will justify normality assumption, there 
are situations where nothing is known about the underlying dis¬ 
tribution or a normality assumption, may not be desirable. In 
such cases we resort to some distribution-free procedures, that is, 
procedures where a particular basic distribution is not assumed. 
In chapters 10 and 11 we were considering only parametric tests, 
that is, tests where a particular hypothesis is tested against a 
parametric alternative. In other words the tests were restrictions 
on estimable parametric functions. There are other situations such 
as testing the independence of populations, randomness of data, 
compatibility of a particular data to a theoretical distribution, 
departure from normality etc. A few distribution-free procedures 
will be considered here. 
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12 . 81 . The Sign Test. This is a distribution-free test 
-yhich can be conveniently applied when the underlying population 
is known to be continuous and symmetrical. If we want to test a 
hypothesis that a parameter 0=0 O where 0 O is a specified value of 
0 and if have a random sample which are observations on 0 
then a single sample sign test can be formulated as follows. Assign 
a plus sign to all the observations which excee 1 0 O and a minus 
sign to all the observations which are less than 0 O , It the popu¬ 
lation is known to be symmetrical about the ordinate at 0=0 O the 
probability p of getting a plus sign can be taken to be 1/2 without 
any loss of generality. So the hypothesis 9=d 0 reduces to the 
hypothesis p — 1/2 where p is the probability of a success in a 
Binomial probability situation. Since the population is continuous, 
strictly speaking, the probability of getting an observation equal 
to 0o is zero. If there is an observation equal to 0 O this is caused 
by rounding of the numbers and hence it may be omitted. In this 
case the alternative to the hypothesis p= 1/2 can be formulated 
asp<l/2, p> 1/2, or p^l/2 according to the requirement of the 
experimenter. 


Ex. 12.81.1. The breaking strength measured at random, of 
cotton thread spun by a particular process is given in the following 
data. Test the hypothesis that the expected breaking strength is 10 
units, at the 5°/ 0 level. Assume that the population is continuous 
and symmetrical and a sign test is in order. 10.1,10.3 10 5 10 1 
10.0, 9.7, 9.8, 9.7, 9.9, 10.2, 10.2, 10.4, 10.1, 9.9, 9.8, 10 1, 10.3. 


Sol. We want to test H 0 : 0=101 H 0 : ^ = 1/21 

Hi : Orfz 10J^ Hi : p^]j2f 

(where p is the probability of a success in a binomial situation 
with 18 trials). One observation equals 10 and hence we omit this 
from the sample. There are 16 other observations or we have a 
random sample of size 16. If the numbers greater than 10 are 
denoted by a plus and the numbers less than 10 are denoted by a 
minus, we have the following data. J 


10.1, 10.3, 10.5, 10.1, 10.0. 9.7, 

+ + + • — 

9 9, 9.8, 10.1 10.3. 


9.8, 10.2, 9.9, 10.2, 10.2, 10.4, 10.1, 
“ + - + + + + * 


- - + + 


z=the number of plus signs =the number of successes=ll. 
a two tail binomial test at the 5% level with total number of 
trials equal to 16, P{»> 12}<0.025 and P{jc^3}<0.025. 

But the observed number of successes equals 11 and this 
Goes not lie in the critical region and hence we will accent 
hypothesis at the 5% level. F 

Comments. When the sample size is large we can use a 
or mal approximation. (See section 10.6). The same technique 
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. for testing equality of two parameters or equality of 

r n b LuTation8 if ^have paired samples. If we have » paired 
two populatio observations come from one population 

come from a second population then 
*“nlus si»n can be assigned to the pairs for which the hret observa¬ 
tion i8 greater than the second one and a minus sign le assigned to 
the oairs for which the seoond observation is greater than the first 
or vice versa. Now the problem reduces to the one similar to the 
Sngle sample problem. Here also the pairs for which there are 
ties*will be omitted. The assumption of symmetry in the popn- 
lfitinn can be avoided if we take 6 Q as the median in the population 
designated by the stochastic variable X so that B{x>6 0 } — P{£<0 O }. 

12.82. The Rank Tests. This test is based on the rank 
sums and can be conveniently used for testing the equality of 
populations or in other words for testing whether two samples 
have come from identical populations. The two sample sign test 
could be applied if we had the two samples of the same size. 
This test can be applied even if the sample sizes are different. 
Consider two random samples of sizes Wj and n 2 . We want to test 
whether the samples have come from identical populations. We 
pool the sample observations and rank them according to the order 
of magnitude. If some observations have the same magnitude 
distribute the mean ranks among the observations. For example 
if there are 3 smallest observations assign the rank (1 +2 + 3)/3 = 2 
to each of them. The next observation has rank 4 etc. One of the 
tests known as the Mann-Whitney U test is based on the statistic 
U, where, 

U= n x n 2 -f n x (n x -f1 )/2—R 2 

where n x and n 2 are the sample sizes and is the sum of the 
ranks occupied by the first sample of size n x . It can be shown 
that the mean and variance of U are, 

E(U)=wi%/2 


and =nin 2 (ni-\~n 2 -\-\)l\.2 

and further the standardized statistic 

T=[U—E(U)]/ct u 

is approximately normally distributed when n x and n 2 are large. 
A good approximation is obtained when n x and n 2 are greater than 
8. Exact tables of U are given in D.B. Owen, Handbook of 
Statistical Table*, Addison Wesley, 1962 (See the bibliography at 
the end of this chapter). 

Ex. 12.82.1* The following data give the increase in weights 
of samples of 10 and 9 experimental animals who are given two 
diets A and B. Test the hypothesis, at the 5% level , that the diets 
are equally effective by testing the hypothesii that the samples have 
come from identical populations. '' 
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analysis 


Viet A 

12 

15 

14 

13 

12 

11 

10 

16 

11 

18 

Viet B 

10 

9 

8 

14 

19 

20 

21 

22 

k2 


Sol. W© will pool tli© samples and will arrange the numbers 
according to the order of their magnitudes. In order to distinguish 
tb© numbers we will use a subscript A for the numbers from diet 
^ (first sample). 


pooled sample 

8 


9 

10 

10 A 

11a 

12 a 

12 a 

13 a 

14 a 

14 

15a 

Ranks 

1 


2 

3.5 

3.5 

5 

6.5 

6.5 

8 

9.5 

9.5 

11 

—- 
















Pooled sample 

16 a 

17 a 

18a 

19 

20 

21 

22 

22 



Banks 

12 

13 

14 

15 

16 

17 

18.5 

18.5 



Here for example there are two numbers equal to 10 which 
are supposed to occupy the ranks 3 and 4 and hence they are 
given the ranks (3+4)/2 = 3 5 each. The total number of ranks 
occupied by the first sample 

= 3.5+5+6.5 + 6.5 + 8+9.5+11 + 12+13+14=89=11! 

Therefore an observed U 

=«iM 2 + Wi(Wi+1)/2-E 1 =(10)(9)+(10)(11)/2-89=56. 

E (U) = nmt /2 = (10) (9) /2=45 

and <y* =Wi7i 8 (?ii+W2 + l)/l2 

= (10) (9) (10+9 + l)/12 = l 50 

An observed value of the standardized U 
= t = [u— E(U)]/a u 
= (56 - 45)/ < 1 • 96 

Since T has an approximate normal distribution, 

P{J > 1.96}=0.025 

Approximately. Hence the observed T does not fall in the critical 
re gion and hence the hypothesis that the populations are identical 
cannot be rejected from the evidence of these two samples. 

Comments. This U test can be modified to test the hypo¬ 
thesis that two populations are identical, against the alternative 
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that the population variances are different. In testing the 
hypothesis that k independent samples have come from identical 
populations, a test known as Kruskal-Wallis H-ttst, is usually 
used. Another important distribution free test is the run test. For 
these and related matters see the bibliography at the end of this 
chapter. 


Exercises 


12.36. Use a sign test for testing the hypothesis, at the 5% level, that 
the mean yield of a hybrid corn is 60 units, by using the following data 
which give the yield of corn in 18 test plots. 48, 48.5, 49, 49, 50, 5 , , 54, 

53.5, 52, 49.6, 53, 52.5, 51, 47, 52, 51.5, 53. 


12.37. A test conducted on 41 experimental animals to study the 
interval of time from the time they are subjected to a particular experimen¬ 
tal condition till they die, yields the following time intervals. 1.92, 1.93, 1.92, 
1.98, 2.00, 2.00, 2.20, 2.10, 2.10, 2.15, 2.17, 2.18, 2.20, 2.21. 2.30, 2.18, 2.10, 1.99, 
1.98, 2.30, 2.22, 2.35, 2.25, 2.17, 2.i2 f 2.18, 2.19, 1.97, 1.98, 1.96, 2.10, 2.12, 2.15, 
2.24, 2.30, 2.18, 1.94, 1.95, 1.1 6, 19S, 1V9. Use a sign test to test the hypoth¬ 
esis that the expected duration 0*2 against the alternative that 0>2 at the 
1 % level. 


12.38. Two types of missiles are test fired 12 times. The flight dis¬ 
tances are given below. Pair the observations at random and use a two 
sample sign test at 5% level, to test the hypothesis that the two types of 
missiles are equally good as far as the expected flight distances are con¬ 
cerned. 


A 2000, 2050, 2045, 2100, 2075, 2070, 2080, 2050, 2030, 2040, 2075, 2055. 

B 1920, 1980, 2100, 2055, 2040, 2025, 2015. 1990, 2045, 2060, 2025, 2035 

12.39. Out of 40 tea tasters who have tasted two different brands of 
tea, 25 of them preferred brand A, 12 of them preferred brand B and the 
rest could not prefer one to the other. Use a 5% sign test to test the 
hypothesis that brand A is better than brand 8. 

12.40. Kruskal-Wallis H test. This test is similar to the sign teBt 
and is used for testing whether k independent samples have come from k 
identical populations. The samples are pooled and ranked. The statistic 
used is. 


12 


k 


Hi 


n{n+l) i = i n { 


* k 

—3(n-)-l) where n=I. 




n t is the size of the ith sample and R^ is the sum of the ranks occupied by 
the ith sample. When n{>5 for all i, H has an approximate chi-square 
distribution with 7c— 1 degrees of freedom under the null hypothesis. 


The following data give the percentage reduction in skin rash by the 
help of three different beauty treatments conducted on three random samples 
of girls of a particular category. Test the hypothesis that the three treat¬ 
ments are equally effective, by using a H test, at 1% level. 
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12.41. The Run Test. This is a convenient test for testing the 
randomness of a sample after having obtained the sample. The test is based 
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Hence we can test the randomness of the sample by using this norma 
approximation. A good approximation is obtained when n, and n arel 
greater than 10. Instead of a qualitative data as described above if we 2 haves 
a numerical data we can easily put the data into a sequence of'two letter- 

A and B where A denotes the numbers greater than the median and ’B <Ve 

notes the number less than the median. 

A machine produces a particular article which can be classified as 
defective D or non-defective G. One item each ©n every half hour is tested 
for quality. The data is given below. Test at 1% level, whether the sample 
can be considered to be random as far as the quality specifications of the 
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ci ci ci ci cq 
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Elementary 

05 tH o co oo 

c<l w Oj H H 

00 CO CO CO CO 

o r- 50 CO H 
rH O O O O 

CO CO CO CO CO 

02 00 CO to oo 

02 02 02 02 0i 
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cq o oo 50 

02 02 00 00 X 
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^ CO 00 O ^ 
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00 CO 02 50 Oq 
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00 CO CO CO CO 
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CO CO CO CO eo 
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co co cq co oo 

co co oo co co 
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co co CO CO CO 

3.23 

3.15 

3.07 

3.00 

from Can 

^ 05 50 rH 00 

50 ^ CO 

Tfi Ti? Tji *rj5 

io cq o go co 
co co co cq cq 

Hji Tfi *4 ^ Tji 

h#i oo h o oo 
cq ci cq cq rn 

Hji ^5 Hj5 Tji 

N 5ft CO H O 

H H H H IH 

Tfi ^f5 

4.08 
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3.92 

3.84 

>le is taken 

50 CO t> 00 Oi 

I-H H p— 1 1 5 rH 

o h cq CO 
cq cq cq cq cq 

50 50^00 05 

cq cq cq cq cq 
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CO CO CO CO CO 

O O O o c3 
^ CO O o -p 
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ANSWERS FOR SELECTED QUESTIONS 


1.3. 


Chapter 1 

(1, 1)^ (2, 1),..«.«k(6, 1) 

(1, 2), (2, 2),.(6, 2) 


1.4. 


15. 

1 . 6 . 


1.7. 


1 . 8 . 


1.9. 

1 . 10 . 
1 . 11 . 
1 . 12 . 
1.13. 
1.16. 


(1, 6), (2, 6),.(6,6) 

(a) (l, 2, ., 52} where the cards are numbered from 

1 to 52. 


(6) {(», y\x,ye{l, 2, ..., 52}} 

(c) {{x, y) | x,y 6 (1, 2, ..., 52}, x^y} 


(1, 2, ..., 14} (the balls are numbered from 2 to 14). 

{la, b) j a € (1, 2, ..., 52}, b e (0, 1, -1}, &= — 1 | a e{27, 28, 
.... 52}, b € (0, 1} | a e (1, 2, ..., 26}}. (The cards are 
numbered from i to 52 of which 1 to 26 are red cards. 
The faces of the coin are assigned the numbers 0 and 1 if 
no coin is thrown the outcome is denoted by 1). 


{a) 35 sets, (6) 840 vectors. 

(a) {(0, 0, 1), (0, 1, 0), (1, 0, 0)} ; (b) same as in (a) ; 

(c) {(0, 1, 1), (1, 0, 1) (1, 1, 0), (1, 1, 1)}. Heads and tails 
are denoted by 1 and 0 respectively. 


(2, 4, 6} 

Vi+V 2 =(- 4, 8, 5) ; ViV 2 ' = 7 
(6, _ 5 } 0, 8) (not unique) 

^=(0, 1, 1) ; V 2 =(2, -1, l) (not unique) 

Not linearly independent. 


-113 4" 




“ 0 " 

2 10 1 


%2 

= 

2 

_0 -3 5 -1_ 


#3 


_ 5 _ 


X 
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1.17 


(a) Tl/2 1/2 
_1 0 

(b) [-1/2 1/2“ 

M 2 - 1 / 2 . 


"1/3 1/3 1/3 
2/3 0 1/3 

- 1/2 1/2 0 
1/3 1/3 1/3“ 
2/3 0 1/3 

,0 2/3 l/3_ 


1.19. p 1 1 1 0 0 0 

110 0 110 

JL 0 1 0 1 0 1 

1.24. B= F-2 1 ' 

_3/2 —1/2_ 


1.27. 316. 


Chapter 2 


2.1. 

(a) 

64 ; 

15 excluding <£, 




(b) 

15; 

7 excluding <£. 



2.2. 

18. 



2.3. 

25.26 3 . 

2.4. 

(1) 

24 ; 

(2) 6 ; (3) 360. 

2.5. 

25200. 

2.6. 

(a) 

30 ; 

(6) 100. 

2.7. 

(a) ( 3 0 ) ; 

2.9. 

6 3 . 



2.10. 


2.11. 

(a) 

120 ; 

00 

o 

• 

2.12. 

ro- 

2.14. 

(d) 

3/8; 

-45/2048. 

2.16. 

60. 

2.17. 

not 

unique. A={2, 5}, B 

={2, 0, -1} ; A= 


B= 

{2, 5, 

0, -1}. 




respectively. 

(6) {(T, T, T), (H, H, H), (H, H, T), H, T, H), (T, H, H)} 
2.19. (1) {4, 5, 6}, (2) {0, 1, 2, 3, 4, 5}, (3) {3}, (4) {0, 1, 2}. 
2 . 21 . (a) yes, (b) no, (c) no, (d) no. 

2.23. 0.7 ; 0.8 ; 0.7 ; 0.8 ; 0.3 ; 0.2 ; 0.3. 


ANSWERS FOR SELECTED QUESTIONS 


2.27. 15/36. 


2.24. 2 8 . 2.25. 3/5. 

2.27. 15/36. 2.28. (igjCg)/( m”) ' 

„ /200 \/100\/50\/50\ //400\ 

2 * 29, V 20 jv 10 jviO/V 0 j/V 40 J * 

2.30. yes. 2.31. (a) 1/16, (6) 3/51. 

2 32. Consider an experiment of throwing a coin twice. (») 
Events of getting 0 head and 3 heads, (6) 0 head and one 
head, (c) {(T. T) (T. H)} and {(T. T), .(H, T)}, (d) exactly 

one head and at least one head. 


2.33. 




2.34. 1/3. 2.35. 2/5. 

2.36. (a) 0.80, (6) 0, (c) 0 20. 2.37. 3/8. 


2.38. (a) 2/7, ( b ) 5/7. 2.39. 2/3. 

2.40. 2/3. 2.41. 0-65. 

2.42. 1/23. 2.43. 0-5558. 


2.44. 0 5558. 




’ ■* 


Chapter 3 

3.1. X-the number of heads in the outcomes; Y—2 times the 
number of tails in the outcomes ; (6) X-sum rolled, 
Y-difference rolled ; (c) X-number of boys in the out¬ 
comes, Y—2 times the number of girls —3, in the outcomes. 

»*■ /<*>=( * Xw^J/CS)- —°* 

!•(:>:')= ^ /(*). 

x=0 

3.3. /(*) =^ 5 ®|(0.01) i ( 0.99) s »-*, !■(*')”^’_ 0 /(*). 

3.4. /(*) = ^®°^l/2)*(l/2)™-». 

“• «*> - (“x..")/(r.) • 
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3.6. 


(a) no, (6) yes, F(z) = 


0, x< — l 
1/3, — l<z<0 
2/3, 0<#<5 
1 , 


1 0, x<0 

x 2 /2, 0^«<1 
2x-x 2 I2-1, 1<z<2 
1 , *> 2 . 


3.8. (a) Jc = 1, (6) 3/2, (c) 1/2. 

3.9. (a) 1/8, (6) 7/16, (c) 5/8. 

3.10. (a) 1/2, (6) 1/4, (c) 1/2, ( d ) 3/4. 

r4$/5, 0cc<i 

3.11. f(%)— \ 2(3 —a?)/5, l<z<2 

1^0 elsewhere. 

fO, z<-2; (2) 3/8; (3) 5/8 

1/8, -2<a< —1 

3.12. (1) F(z)= J 3 / 8 , — l<z<0 

j 6/8, 0<«<2 

(l, a> 2 - 

3.15. 50. 316 ‘ $ L 

3.17. 2 02. 318 2 23 * 

3.20. (1) 0 ; v 7/2, (2) 1/0 ; 1/0- 
3.24. (a) Binomial, w=10, p = 1/2 

(6) Binomial, w=4. p = 2 /3 
(c) Binomial, ft = 3, p = 3/o. 

3 25. (e 2# _l)/£ 2 . 

3^7! (1) 0/2 ; (2) 0-20 ; (3) 0/2 ; (4)0 ; (5) 0/2 ; (6) 0/J lZ 

3-29. (1) 2 ; (2) \/4.8. 

3-30. (1)9/10; (2) 39/40; (3) 89/90. 

3.31. (1)3/8; (2)1/4; (3) a/4. 


Chapter 4 

4.1. (a) 0.1631 ; (6) 0 9692 ; (c) 0.3087. 

4-2. (a) 0.2344 ; ( b ) 0 6563 , (c) 0.6562. 

4.3. 0 0148. 4-5. 0.0881. 

4-6. 10/3 5 — 0.0412 . 4.8. (a) 0.1513 ; (6) 0-5897. 

4.9. (a) 0.0805 , ( b) 0.8110. 
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4-10. 0.00513, 0.0071 ; 0.0285, 0-0164 ; 0 0775 0 0409 • 0 

0.0683; 0.1800,0.0854; 0 1843, 0 0854 ’ 

4.12. 0.00096, 0 00106. 

4.13. (a) Binomial, N=5, 1/2 ; 

(6) Binomial, lS T = 3,p = l/3 (c ) Poisson> A=2> 

4.14. A ; A. 4.15, (a) 0.08854; (6) 0.2438, 

4.16. (a) 0.0323 ; ( 6) 0.000. 

4.17. (a) 0.0000 ; (6) 0.9706. 

n 

4.18. ( a ) Z e tx ifn ; (6) pe^l-qe*)-', q = l-p, 

i —1 

(1) Geometric, # = 1/5 ; (2) Geometric, p = 1/2. 

4.19. 0.0000. 4.20. (a) 0-0040 ; (6) 0-1005. 

4.21. 0-00001. 4.22. — (1— p)/p log p. 

4.24. (a) (e^-e tot )/«(£— a) ; (6) (l-flf)-i. 

4.25. (a) Gamma, a=2, £=3 ; (6) Exponential, 0=1, 

4.31. (l)e- 2 ; (2) 26' 1 . 4.32. e' 3 /* ; 

4.33. e' 125 . 4.34. 122g- 1( >. 

4.35. (a) (1/2) eP 2 ' 2 ; (6) eP 2 (2a eP 2 -l)/4. 

4.36. (a) op/(P-l) I (6) a 2 p/(p-l) 2 (^-2). 

4.38. (a) 0.999 ; (6) 0.1062 ; (c) 0-9938 ; 

(< d) 0.0000 ; (e) 0.0000. 

4.39. (a) 1.96 ; ( b ) 2.58. 4.40. (a) 1.96 ; ( b) no. 

4.41. 0 9772; 70.66. 4.42. (1.9804,2.0196). 

4.43. (1) 0.9772 ; (2 ) 0.3811. 4.44. 0.0456 ; <0.25. 

4.48. /(x) = V2/7t e cc2/2 > °<*< oo and 0 elsewhere. 

4.50. /(</) = r 1/3, l<</<2 

I (l/V^"l-l/3)/2, 2<i/<10 


0.1380, 


, V. 0 elsewhere. 

4 51. (a) N(/x,=0, a = l) ; ( b) Gamma, a—2, ^=3/2 ; 

(c) Exponential, 0 = 5 ; ( d) Poisson, A = 2. 

4 52. (a) Binomial, p = 2/3, N = 5 ; (6) Geometric ; (c) Rectangular; 

(d) N(/x = 2, a = V2). 

Chapter 5 

5.1. (1) X —number of red marbles in the outcomes ; Y—num¬ 
ber of white marbles ;/(0, 3)=3 3 /8 3 ;/(l, 2)==3 2 52/83 • 
/(2 ; l)=3 2 .5/8 3 ; /(3, 0) = 5 3 /8 3 and 0 elsewhere. 
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T —number of red marbles, Y -number (red-white) ; 

(2) f (0 , —3)=3 3 /8 3 ,/(l, —l)=3 2 .5 a /8 3 ,/(2, l)=3 3 .5/8 3 ,/(2, 3) 

=5 3 /8 3 . 

5.2. (1) /(2, 0)=l/36,/(3, 1)=1/36 , /(12, 0) = l/36 

(2) F(2, 0)=l/36, F(3, — 1)=2/36,..F(£, y) — \ for *^12, 
2 /><>. 

( 3 ) f(l)=l/ 6 . /( 2 )=l/ 6 ,..., /( 6 ) = l /6 and /(*)=0 elsewhere 
y(l) = l/ 6 ,?( 2 ) = l/ 6 ,..., 0 ( 6 )= 1/6 and 0 (x)=O elsewhere. 

(4) /(I | i/=4) = 1/6. jf(6 | j/=4)=l/6 and /(%)=0 else- 

where. 

5.3. (1) F(— 1, 0)=1/15, F(—1, 1)=4/15, F(-l, 2) = 6/15 

F(0, 0)=3/15, F(0, 1)=8/15, F(0, 2) = 11/16 
F(l, 0)=4/15, F(l, 1)=10/15, F(l, 2) = 15/15 
and F(a;, y) = 1 for a>l. ?/>2. 

(2) /(*)= [ 6/15, z = — 1 

5/15, #=» 0 
4/15, z=l 
0 elsewhere. 

(3) /(* | «/=2) = f 2/6, -1 

1/5, a; = 0 

2/5, a;=2 and 0 elsewhere. 


5.4. f{x)=( 3/12, z=0 
9/12, *-=1 

0 elsewhere 


0 («/) =•■ 


f 3/12, 2/ = 0 
3/12, 2/ = l 
6 / 12 , y =2 

0 elsewhere. 


/(?/ | a=0)=f 1/3, y=0 
1/3, 2/=l 


1/3. 77=2 and 0 elanwhera 


5.5. (1)8/105; (2) 2(*+l)/3, 0<*<1 ; (3) 2(z+3)/7, 0<z<l. 

5.7. (1) 6/15 ; (6) e' 1 ' 2 . 5.8. 15/256. 

5.9. 1/2-5 e- 2 /6. 5.10. g/lO-e-He -10 / 10 * 

5 11. (1)1; (2)1/2; (3)0; (4)0. 

5.12. (1) 2/9 ; (2) 1/9 ; (3) 0 ; (4) 0. 
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5 13 o. 5 * 19 * 3 / 7 - 

In ; Np t N(N-l)i>i* ; 

5 - Z1 * ^ N(N— l)PiPti' 

5.22. 0‘8994. 

5.24. /*i=10 ; ft a =15 ; cxi=2 ; a a =3 ; p = l/2. 

5.25. (1) 0*9109; (2) 1—e- 3 / 2 . 5.27. (a) yes ; (6) yes. 

5.28. (L) Pi+2/as, cti 2 +4ct 2 2 , oi 2 +4ct 2 3 +4 cov (x, y). 

(2) fti—/u 2 » ®i 2 +ct 2 2 , <Ji 2 -hcT 2 2 —2 cov (re, y). 

5.29. Var(X)—Var(Y). 

5.30. Bivariate, /x 1 =0 = /x 2 , cji=a 2 = l, p = 0. 

5.33. N(/i, o/V w )- 

Chapter 6 

6.3. (1) Binomial, N = 55, p = l/2 ; (2) Poison, A = 11. 

6.4. /(y) = r4y/6 2 , 0<y<e/2 

\4 (0-y)l6 2 , 6/2<y<6 and 0 elsewhere. 

6.5. Gamma, (3 = 3, a.—n[n-\-l)a/2. 

6.6. (a) N(/xi—2^2+/ i 3)\A<7 1 2 -|-4a2! 2 -l- G 3 2 ) ; 

( b ) N^i + ^2—5/a 3 ,V^i 2 + ct 2 2 + ^5o3 2 ) 

N N 

6.7. Ep t Epi{ l—fi) 

i=l i = 1 

N / TVJ \ N —n — n[x— ix) 2 /2ct s . 

0-8. 27 ( N W*(l—p) .(»/2TC) 1 / 2 e 

n=0 V M / 

6.9. (1) /tNp ; (2) Npa 2 +/x 2 N^(l—^). - 

6 . 10 . N(ju, a). 

6.11. 2 = 2.4, y= 2*4, S! 2 =3.44, s 2 2 =23.84, r=0.9767. 

6.12. (l)X=(Xi+X a )/2, Y=(YH-Y 2 )/2 ; 

(2) X (X,-X) 2 /2 £ (Yj—Y) 2 /2 ; (3) 27 (X«-X)(Y«-Y)/2. 


6.13. 

(ai 2 /wi + a 2 2 /w 2 + 4<7 3 2 / ft.3) 1 ' 2 . 

6.14. 

<y(l/w 1 4-l/w 2 ) 1/2 . 

6.15. 

>0.7333. 

6.16. 

<(16.6/15) 1 / 2 . 

6.17. 

1 . 

6.18. 

0 . 0000 . 

6.19. 

>0.9756. 




0.20. /(w 1 )=w((3—Mi) n ' 1 /(P — a ) n . a<Wi<P 
0(w 2 ) =«(w n — a)"" 1 /!(3 —a) w , a<w n <(3. 

6.21. / l R)=n(n—l)R M - 2 [p-a)-R]/((3-a)«, 0<R<(3—a. 


$ 22 (2?i+l) ! 

(» !) 2 


7f® w 

( j /(s)da^ ^ | /(a:) <Zrc ^ /(m), 
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1 -( g - / z ) 2 / 2(j2 j __co<?W<C50. 

where A®)— 

fi 24 (l/0)e _R / 0 , 0<R<oo (n = 2). 

s :rr< 2 ) *-* <■> °- m ; <6) on5 - 

6.35. (1) 6 ; (2) 0.9664. 


6.34. (o) 0.025 ; (6) 0.115. 


7.1. 1 approx : 

7.3. 0.9266. 

7.5. (33.56, 51.44), (- 00 , 50). 

7.7. (-3.658, -0.342) ; no. 

7.8. accept. 

7.12. 0.99. 

7.14. (9.11, 42.6). 

7.16. 0.95 approx. 

7.18. yes at 98%. 

.■UMt-0 

7.20. yes at 95%. 

7.22. 0.0287 approx. 

7.24. 0.4 approx. 

, ' m V r (~2~^ r ) 

7.25. J / 


7.2. 

0.0047. 

7.4. 

(33.37, 36.63) ; no. 

7.6. 

0.9936. 

7.9. 

accept. 

7.13. 

0.5 approx. 

7.15. 

1 approx. 

7.17. 

(9849.2, 10150.8). 

for v>2 


7.21. 

0.0001 approx. 

7.23. 

31.4. 


\ 

r ( 2 

r\ 


n \ 
2" / 


rmrl elf) not exist if 


7.27. 

0.9 approx. , P k Fi 4 , i 9 <2.04). 



Chapter 8 


81. 

(80.2282, 85.7718). 

8 .2. 

(64.2621, 65.7379'. 

8.3. 

(993).2619, 10069.7381). 



8.4. 

(4.9883, 5.0317) ; (11 no ; 

(2) yes. 

8.5. 

($ 0.95, 1.05). 

8.6. 

(26.8093, 73.1907). 

8.7. 

(0.526, 0.574). 

8.8. 

(6.5888, 13.4112). 

8.9. 

(-2.4231, 4.4231). 

8 .10. 

(18.5359, 21.4641). 

8 .11. 

(—8.4552, -1.5048). 

8 .12. 

(0, 2599). 

8.13. 

(412, 1588) 

8.14. 

(-4.62%, 2.62%). 

8.15. 

(-14.51%, 4.51%). 

8 16. 

(0.155, 0.299). 

8.17. 

(14.7, 75.2). 

8.18. 

(2.38, 8.58). 

8.19. 

(184.5, 673.5). 

8 ,20. 

( s /[li 2: «/2 /V 2 ^]) 


k ob selected questions 

JJSWEKS 

Chapter 9 

0 j 9.2. 5. 

p = j/S and S is obtained numerically from 

(S log*<-» l0 S *> +’ 1 log “ _K ^ l0g r (11)=0 ' 

9.4. * 7000. 9 5 - ® 17 - 5 ' 

, 6 J=min (*i.*.) anfl *. )• 

9-6 ' S ,9~ 11/(1-J) ' {—1+V 1-4 ™ S og r, ^ 2 ‘ 

9.8. 2+X(X-l)/N(N-l). 9- 14 ^ 

9.15. 0i I not both. 

9.23. (1) 2X ; (2) X X*; (4) ^ Xj. 

Chapter 10 

10.1. 8-{(Q. G). (G, B). (B, G), (B. B)}; 0-{(G. &)}■ 

10.2 S-«*. V) \ -°°<*<°°- -°° <y< ^ X: y) | («+„)/*>#>. 

10.3. a=l/8 ; (3=19/27. 

X0.4. C = {z | »>15} ; a = e' 3 ; (^=l-c . 

10.5. 5. 


10 6. reject H 0 if (a) ws 2 /a 0 ^ ^ > 

(b) 3i>G ot where P{gamma with a = «, P= 0 /™ 1S > G 4 = a *’ 

(c) x^k» where 27 2>o !B ^ a ' 

10.8. a=0.14, (3 = 0.717. 

•0.10. 0 ; 1/5 ; 16/45 ; 21/45 ; 24/45 ; 24/45 ; 21/45 ; 16/45 ; 1/5 ; 0. 
10.18. (1) reject ; (2) reject. 10.20. accept. 


10 . 8 . 

10 . 10 . 

1018. 

10 . 22 . 

10.26. 

10,30. 

10.34. 

11 . 2 . 

11 . 6 . 

11 . 8 . 


reject. 

accept the claim, 
reject. 


10.24. reject. 

10.28. 18. 

10.32. 0.7974; 0.4119 ; 0.0171. 
10.36. reject. 

Chapter 11 

11.4. not a good fit. 


no. 

There is evidence of independence ; £> = 0.28. 
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12 . 2 . 

12.4. 


12 . 6 . 

12 . 8 . 


12 . 10 . 

12 . 12 . 

12 . 20 . 

12.28. 

12.30. 

12.34. 

12.40. 
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Chapter 12 


(1) 1; (2) 2 ; (3) oo. 

(1) a=—0.007, 6 =—0.475, c=-0.05; 

(2) 0=1.10,6=21.18 ; (3) a=4.86, 6=0.106. 

(1) 8/15. 


a= hi — bfiz — Cfi^. 

6 = ^(Jl2 Og—(r 23 (713^ j ^ 


2 



) 


Oi 

a 2 


(pl2- Pl3 P22)/ 




2 

a 13 ° 2 


— <723 CT12 



2 

a ■ 


2 



-(P 13 — P 23 

<*3 




a = 1430.0, b —— 61.6, c — —11.4. 

£{Xi —/>0 2 / M * 12.14. 7.5. 

significant 12.26. not significant (methods). 

not significant (methods). 

no. without more assumptions. 

not good. 12.36. , accept. 

accept. 



A 


index 


,i lS oIute value, 84 

weptance sampling, 227 

l^ebra of sets, 47 

Almost everywhere, 83 

Analysis of covariance, 372 

dialysis of variance, 363 

Analysis of variance technique, 362 

Associative law, 49 

B 

Bar diagram, 79 

Bayes’ procedure, 354 

Bayes’ theorem, 71 

Bilinear forms, 33 

Binary operators, 47 

Binomial coefficients, 42 

Binomial probability situation, 123 

Binomial proportion, 123 

Bimodal, 10S 

Bivariate normal surface, 196 

C 


Categorized data, 330 
Central limit theorem, 219 
Central tendency, 106 
measures of, 107 
Chance of variables, 76, 

Change of variables, 116, 203 
Chebyshev’s theorem, 112 
Coefficient of association, 340 
Combinations. 38 
Commutativity, 49 

Completely randomized experiment, 
374 

Concomitant variable, 371. 

Confidence interval for, means, 254 
difference between means, 258 


proportions, 263 
variance, 265 
Confidence regions, 268 
Completeness, 116, 283 
Consistency, 278 
Consumer’s risk, 228 
Contingency table, 337 
Control charts, 270 
for means, 270 
for proportions, 271 
Convolution, 21 I 
^relation coefficient, 185 
linear, 185 


product moment, 185 
countable number, 83 
^variance, 184 
Critical region, 291 
cumulants, 105 


D 


Decile point, 105 
Density function, 83 
Designs, 372 

Balanced incomplete blocks, 372 

Factorial, 374 

Greaco-Latin square, 374 

Incomplete block, 372 

Latin square, 372 

Lattice, 372 

Nested, 374 

Partially balanced, 372 
Randomized block, 372 
Split-plot, 371 
Youden square, 374 
Determinants, 24 
Cofactor of a, 26 
Minor of a, 27 
Differential operator, 91 
Dispersion, 108, 185, 342 
Principle of minimum, 344 
Discontinuity, 83 
Distribution-free property, 114 
Distribution, 81 
Beta, 121 

Binomial, 102, 119, 122 
Bivariate normal, 190 
Cauchy, 121 
Chi-square, 122, 234 
Conditional, 177 
Cumulative, 81 
Discrete geometric, 120, 143 
Discrete uniform, 120 
Exponential, 120, 147 
Function, 81 
F-distribution, 124, 244 
Gamma, 120, 148 
Gbiissian, 121, 153 
Bypergeometric, 119, 137 
Joint, 173 
K -variate, 172 
Logarithmic. 170 
Lognormal, 164 
Marginal, 174 
Multinormal, 189 
Multivariate, 172, 197 
Non-central F, 246 
Non central t , 242 
Normal, 121, 153 


areto, 1 64 
oisson, 9^, 120, 130 
•ower series, 169 
octangular, 99, 145 
ampling, 214 
tudent t f 121, 239 
Jniform, 99, 120 
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£ 

Entropy of a finite scheme, 73 
Estimable parametric function, 278 
Estimation, 

Interval, 251 
Point, 273 
Estimator, 277 
Unbiased, 277 

Minimum variance unbiased, 279 
Asymptotically most efficient, 279 
Events, 

Elementary, 5 
Impossible, 6, 63 
Sure, 6 

Mutually exclusive, 51 
Non-occurrence of, 52 
Complete system, 73 

F 

Family of probability functions, 89 

Finite scheme, 73 

Frequency, 

Relative, 63 
Function, 54 
Beta, 121 

Characteristic function, 103 
Domain of a, 56 
Gamma function, 120 
Range of a, 56 
Step, 82 

Fourier transform, 169 

G 

Goodness ©f fit* 331 

H 


Bostogram, 79 
Hypothesis, test oi, 


I 


Identity, 

Element, 8 
Matrix, 18 

Independence, 

of events, 69 
Mutual, 69 
Pairwise, 69 
Stochastic, 198 
Information, 74, 281, *85 
Integral operator, 91 
Interaction, S>71- 
Interval, 

Estimation, 3ol 

cfiWtive, 254 


Short on the average. 
Short unbiased, 254 


254 


Invariance, 284 
Inverse, 8 

of a matrix, 30 


K 

Kolmogorov-Smirnov statistic, 377 
Kurtosis, 111 


L 

Lamda criterion, 303 
Laplace transform, 169 
Latin square, 303 
Leptokurtic, 112 
Linear equations, 11 

Homogeneous system of, 15 
N on-homogeneous, 15 
Linear representation, 15 
Logarithms, 73 
Long run, 112 


M 


Mappling, 55 
Matching problem, 46 
Matrices, 

Addition of, 14 
Commutative, 15 
Multiplication of, 15 
JNon.singular, V 
Scalar multiplication of, 14 
Singular, 17 
Matrix, 

Elementary, 24 
Idempotent, 33 
Multinomial coefficient, 44 
Null, 12 
Rectangular, 12 
Square 12 
Stochastic, 13 
Transpose of a, 12 
Mean absolute deviation, 96 
Mean deviation, 96 
Measure, 58 

Probability, 58 
Median, 106 
Mesokurtic, H2 

Method, 

Minimax procedure, 

Multiple decisions, 324 
Method of maximum likelihood, -74 
Mode, 107 
Model, 

Mixed effect, 374 
Random effect, 374 
Variable effect, 374 
Moments, 92, 184 
Absolute, 96 
Central, 94 
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Factorial, 98, 105 
Raw, 93 

Multimodal, 108 
Multinomial expansion, 44 

N 

Negative definite, 33 
Semidefinite, 33 
Neyman-Pearson lemma, 296 
Normal equation, 367 
Normal probability, 161 
Normal regression lines, 195 
Nuissance parameter, 257 

O 

Observations, 12 
Occupancy problem, 47 
One-way classification, 856 
Operating characteristic, 2 27 
Order-statistic, 221 
Ortho-normal basis, 10 
Outcome set, 3 

P 

Pascal’s triangle, 42 
Pearson’s measure of contingency, 340 
Pearson curves, 121, 153 
Percentiles, 105 
Permutations, 36 
Pie diagrams, 80 
Pictogram, 80 
Platykurtic, 112 
Points of location, 105 
Poisson probability model, 119 
Population, 2 
Bivariate, 2 
Continuous, 3 
Discrete, 3 
Finite, 2 
Multivariate, 2 
Positive 

Definite, 33 
Semidefinite, 33 
Power curve, 228 
Probability, 35 

Conditional, 65, 68 
Distribution, 100 
Function, 77 
Personal, 03 
Posterior, 7-1 
Prior, 71 

Probability integral transformation, 

168 

Producer’s risk, 228 
Product, 

Inner, 10, 20 
Logical, 50 


Q. 

Quadratic forms, 38 
Quartile points, 105 

R 

Random effect model. 374 
Random variables, 76 
Randomized block experiment, 372 
Range, Interquartile, 109 
Rank 

Column, 17 
Row, 17 

Replication, 372 
Regression, Linear, 348 
Regularity conditions, 282 
Relative efficiency, 279 
Root mean square deviation, 95 

S 

Samples, 

from a multivariate population, 12 
Representative, 5 
Sampling, 209 
Sampie mean, 215 
Variance, 215 
Scalar, 

Quantity, 7 
Multiplication, 7 
Multiplication of vectors, 20 
Scatter diagram, 347 
Schwartz inequality, 20 
Semi-invariance, 105 
Sequential procedures, 324 
Sets, 1 

Disjoint, 51 
Elements of, 1 
Intersection of, 50 
Ordered, 7 
Set functions, 57 
Addivity of, 57 
Total additivity of, 57 
Shortest oil the average, 254 
Skewness, 110 
Space, n-dimensional, 20 
Square contingency, 340 
Standard deviation, 94 
Standard error, 217 
Statistic, 214 
Stirling’s for nula 0 
Stochastic process, 209 
Stochastic variable, 76 
Discrete, 76 

Linear combination of, 211 
Standardized, 9s 
Subsets, 4 

Subsets and samples, 4 
Sufficiency, 280 
Joint, 281 
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Minimal, 282 
Sum, logical, 47 


Variance stabilizing, 249 
Trace of a matrix, 33 

U 


T 


Test. 

Admissible, 249 
Best, 292 

Concerning difference between 
means, 312 

Concerning means, 310 
Concerning proportions, 322 
Concerning variances, 317 
Distribution-free, 380 
Kruskal-Wallis, 384 
Likelihood ratio, 302 
Mann-White y, 3»2 
Non-parametric, 380 
Power of, 293 
Rank, 382 
Run, 382 
Sign, 381 
Similar, 299 
Unbiased, 299 

Uniformly most powerful, 294 

Transformations, 

Elementary, 24 
Inverse sign, 249 
Linear, 23 
Orthogonal, 23 
Tanh" 1 , 249 
Square root, 249 


Unbiased coin, 62 
Unimodal, 107 
Union of sets, 47 
Utility, 63 


V 


Variance, 94 
Variate, 76 
Vectors, 

Addition of, 8, 20 
Column, 7 
Independence of, 9 
Length of, 10, 20 - 

Linear combination oi, 10, 
Norm of, 10 
Normalized, 8 
Null, 7 
Order of, 7 
Row, 8 
Size of, 7 
Unit, 9 
Vector space. 

Basis of a, 9 
Dimension of a, 9 
Rank of a, 9 
Spanned, 9 
Venn diagrams, 48 
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